Application No. 10/602,663 
Attorney Docket No. 3495.0199-01 



REMARKS 

Claims 41-46, 50, 51 are currently pending in this application. 

Applicants have amended claim 41 to include the term "transfer" before the term 
"vector" In the preamble and addition of the term "wherein said transfer vector transfers 
the defined nucleotide sequence into the nucleus of a cell." This amendment is 
supported, at least, on page 1. lines 2-4, on page 4, lines 3-2 and line 23, of the 
translated specification. 

In the Final Office Action, mailed October 10, 2006, the Office maintained the 
rejection of claims 41-51 under 35 U.S.C. § 102 as being anticipated by either WO 
97/12622 ("Verma") or Parolin et al., "Analysis in Human Immunodeficiency Virus Type 
1 Vectors of c/s-Acting Sequences That Affect Gene Transfer into Human 
Lymphocytes," J. Virol. , vol. 68, pp. 3888-95 (1994) ("Parolin"). These references, 
though, do not anticipate the claimed invention for the following reasons. 

Verma 

The Office maintained the rejection under § 102 in light of Verma because it 
asserted that Verma teaches "a recombinant vector comprising a polynucleotide - the 
HIV pol vector - that does include cPPT and GTS regions." Office Action mailed 
October 10, 2006, at page 3. Verma, though, does not teach a "transfer vector" that 
comprises the cPPT and GTS sequences of pol that achieves "wherein said transfer 
vector achieves transfer of the transgene or sequence of interest into the nucleus of a 
cell and forms a cis-acting triplex," as provided in the claimed invention. 

The Office is apparently referring to the "first vector" depicted in page 4, lines 22- 
28 and Figure 1 , "Vector 2" of Verma because this vector includes pol sequences. But, 
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this vector is a packaging vector in Verma. not a transfer vector. A packaging vector 
comprises DNA sequences encoding most structural viral proteins needed to form 
infectious viral particles, Including pol. In this context, though, pol is Incorporated into 
the final viral particle as a trans-factor and so Is not able to form the DNA triplex needed 
for nuclear transport. Instead, the cPPT and CTS sequences oi pol must be c/s-acting 
to form a DNA triplex as provided In independent claim 41 . 

In Verma, the actual "transfer vector" that provides the c/s-acting viral sequences 
Is the "third" vector as depicted on page 6, lines 24-29, and in Figure 1 - vector 1 . This 
vector does not Include pol sequence, though. Furthermore, this vector Is not able to 
form a DNA triplex. 

Because Verma does not disclose a "transfer vector" capable of achieving 
"transfer vector achieves transfer of the transgene or sequence of interest into the 
nucleus of a cell and forms a cis-acting triplex" It cannot anticipate the claimed 
invention. Accordingly, Applicants respectfully request that the rejection under 35 
U.S.C. § 102 in light of Verma be withdrawn. 

Paroiin 

The Office maintained the rejection under § 102 In light of Paroiin because It 
asserted that "the Paroiin vector comprises a pol polynucleotide, which naturally 
contains the cPPT and CTS regions." Office Action mailed October 10, 2006, at page 4. 
Applicants disagree with the Office's characterization of Paroiin and accordingly, 
traverse this rejection. Despite the Office's assessment, Paroiin does not disclose the 
complete pol sequence and does not disclose the cPPT and CTS sequences required 
for the claimed invention. 
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Parolin recites: "The v1864 RSN vector contains gag and pol sequences up to 
the IVIsel site (nucleotide 2200) and the v2731 RSN vector contains gag and pol 
sequences up to the Pflml site (nucleotide 3067)." Parolin at p. 3889. Exhibit A is 
provided to demonstrate the position of the closest nucleotides recited in Parolin, at 
page 20, and the beginning of the cPPT and CTS sequences of the claimed invention, 
at page 26. As shown, nucleotides 2200 and 3067, which are the end-points of the pol 
sequences in the vectors v1864 RSN and v2731 RSN, respectively, disclosed in Parolin 
are upstream of the cPPT sequence. 

Specifically, cPPT is located at the end of the pol sequence, inside the integrase 
sequence, as described in Charneau etal. 1991 (Exhibit B) at Figure 1. It should be 
noted that Charneau 1991 relies on numbering based on the DNA, beginning at the R 
element from the LTR 5' end, while Parolin relies on numbering based on the RNA 
beginning at the LTR5' end. This difference results in a shift of 421 base pairs between 
the numbering. Taking into account this shift, cPPT begins at position 4783, which 
corresponds to position 4362 in the HXBc2 HIV-1 provirus, the sequence used in 
Parolin, see Parolin at p. 3889, and CTS ends at position 4902, which corresponds to 
4481 in HXBc2. Thus, cPPT and CTS begin at 4362, considerably downstream of 
3067, the point at which the pol sequences in Parolin end. Parolin, then, does not 
include cPPT or CTS. 

Because Parolin does not disclose a transfer vector with the cPPT and CTS 
sequences, it does not anticipate the claimed invention. Accordingly, Applicants 
respectfully request that the rejection under 35 U.S.C. § 102(b) in light of Parolin be 
withdrawn. 
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Please grant any extensions of time required to enter this response and charge 

the fee to our Deposit Account No. 06-0916. 

Respectfully submitted, 

FINNEGAN, HENDERSON, FARABOW. 
GARRETT & DUNNER, L.L.P. 



Dated: March 9, 2007 




Reg. No. 51,863 
Telephone: (202) 408-4382 
Facsimile: (202) 408-4400 
E-mail: deborah.katz@finnegan.com 



### DNA Strider 1.4n ### lundi 5 «vrier 2007 16:22:16 (US Letter ® 100%) 
HXB2 -> Restriction Map 

DNA sequence 9721 bp ACTGGAAGGGCT . . . AAATCTCTAGCA linear 

Positions of R. E. sites (sites unique in whole sequence cure bold) 

>Fal I Afjbol 
>Hin 41 ^Pi3 I 

>HaeIV Wbol Cha I 

>AfJbo II Dpnl Bst Yl 

<HpyhW Tsp 5091 >Bbs X Cha 1 >Alwl 

>Bsrl CvlJI >Bbr 71 EcoRV Bst KTl Bst KTI 

I I I I I I I I I II 

ACTGGAAGGGCTAATTCACTCCCAAAGAAGACAAGATATCCTTGATCTGTGGATCTACCA 6 0 
TGACCTTCCCGATTAAGTGAGGGTTTCTTCTGTTCTATAGGAACTAGACACCTAGATGGT 

I I [• I • I I I I • I •!! 

19 27 35 44 52 

5 13 27 44 51 

27 44 51 

30 44 52 

30 52 
33 52 
Sty D4 1 
Scr Fl 
PspGT 
BstNI 
Hae III 
Cvi JI 
Unb I 
Sau 961 
Fmu I 
Sty D4I 
Scr FI 
Psp GI 

Bst NI ECO RV 

Bsl I Bsa JI Hpy 1881 
Cvi JI Bsa JI >SiinI >rspRI 

I I II I I I I I 

CACACAAGGCTACTTCCCTGATTAGCAGAACTACACACCAGGGCC AGGGGTCAGATATCC 12 0 
GTGTGTTCCGATGAAGGGACTAATCGTCTTGATGTGTGGTCCCGGTCCCCAGTCTATAGG 

I • • ■ I ■III I •! I I 

68 98 108 120 

98 104 111 

98 114 
98 
98 
98 

101 
101 
101 
102 
102 
104 
104 
104 
104 
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Hae III 

RsaZ CviJl 

>Stsl Csp 61 Hae I 

>FokT Bfal <MnlX 

>Bst F5I Cvi JI >Mbo 11 

Bsll <Bccl Alul <Bsrl Cvi Jl <Earl 

III 11 I I I I III 

ACTGACCTTTGGATGGTGCTACAAGCTAGTACC AGTTGAGCCAGATAAGATAGAAGAGGC 180 
TGACTGGAAACCTACCACGATGTTCGATCATGGTCAACTCGGTCTATTCTATCTTCTCCG 

I -I! • • I I 1- I I- • I III • 

126 132 144 152 159 173 

131 144 173 

131 146 176 

131 149 177 

149 178 

178 

Sty D4I 
<SccI >S til 1321 

NIa III Scr FI 

rati >StsX 
Cvi All >Fokl Hpa 11 
Hpy CB4V >Bst F5I 
Mae 111 Cac SI >Stsl Neil 

Cvi Jl Cvi Jl >Fokl <Siml 

>CstMl Alul Drain <CjePI >SstF5I EcoBl 

I III III I I II I I II 

CAATAAAGGAGAGAACACCAGCTTGTTAC ACCCTGTGAGCCTGC ATGGGATGGATGACCC 2 4 0 
GTTATTTCCTCTCTTGTGGTCGAACAATGTGGGACACTCGGACGTACCCTACCTACTGGG 

I • III- llh II II- I I II- 

186 200 209 

200 

205 



217 


228 


238 


218 


228 


236 


219 


228 


238 


222 


232 




224 


232 


239 


2 24 


232 




224 




238 




229 


238 
238 



Unb I 
Sau96I 
Hae III 
Bfa 1 Fmu I 

>Aci I rail >5th 1321 

Taul HpyCH4IV 
Fnu4HI Pmll Nii 38771 

Bth CI Bsa AI Aval 

<Mnl 1 Cvi Jl <Tsp DTI Cvi Jl 

I III I I II I I 

GGAGAGAGAAGTGTTAGAGTGGAGGTTTGACAGCCGCCTAGCATTTCATCACGTGGCCCG 3 0 0 
CCTCTCTCTTCACAATCTCACCTCCAAACTGTCGGCGGATCGTAAAGTAGTGCACCGGGC 

• I -Mil- I II I I • 

262 272 285 295 

273 290 297 

273 290 297 

273 291 
274 291 297 

278 295 
295 
295 
295 
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<Sts I 
<Fok I 
<Bst F5I 
>SfaHX 
>Ssc AI 
Hpy CH4V Rsal 
TseT Hpy 188III 

Fnu4HI Csp 61 Cac 81 

BthCl TatX CvlJT 

<Bbvl Seal AluZ 

CvJ-JT BspEl TaqX 

Alul BsaVIl EsaBC31 MspAlX 

<AceXXX HpaXX HpylSBXXX >Cdl X >BsmFX >Acl 1 

I I I I I I I I I I I I I I I I I 
AGAGCTGCATCCGGAGTACTTCAAGAACTGCTGACATCGAGCTTGCTACAAGGGACTTTC 3 6 0 
TCTCGACGTAGGCCTCATGAAGTTCTTGACGACTGTAGCTCGAACGATGTTCCCTGAAAG 

III III II II 'I -.Mil * I I 

302 311 321 335 352 360 

303 310 337 360 

303 310 337 

304 315 340 

304 315 340 

304 316 341 

304 310 

306 316 
307 
307 
308 
308 
308 

Mbo I 
Dpn I 
Cha I 
Bst KTI 
<AlwX 

Sty D4 I Hpy 18 81 

ScrFX DdeX 
<MnlX PspGX >BsrX >Bsp CNI 

5tyD4I Bst NI >Bmr X >Mnl X 

ScrFX BsaJX >BsmFX Cvi JX BstYX 

PspGX HaeXXX <5th 1321 Bspl286I 

>BsmFX Bst NI Cvi JI <FauI Ban II AlwHX 

<BseYX BsaJX HaeX <AciX Cac BX >BseMll 

II II II I III I I II llllll 
CGCTGGGGACTTTCCAGGGAGGCGTGGCCTGGGCGGGACTGGGGAGTGGCGAGCCCTCAG 42 0 
GCGACCCCTGAAAGGTCCCTCCGCACCGGACCCGCCCTGACCCCTCACCGCTCGGGAGTC 

I I • I h III ' III I ' I'll llllll 

362 374 385 393 409 416 

366 374 386 393 411 418 

374 386 394 411 

374 388 395 412 419 

374 388 398 415 

379 388 398 416 

388 416 
388 417 

420 
420 
420 
420 
420 
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Tse 1 
Fnu4HI 

BthCX Ddel 

<Bbvl <BspCNI 

Cvi JI <Bse Mil 

AIuI Hpy 188 I 

Pvu 1 1 >Bsr I Wbo I 

Msp All >Bmr X DpnX 

Tse I Rsa X Cha X 

Fnu4HI Csp6X >BsmAX BstKTX 

BthCX rati >BsaX BstYX 

Hpy CH4V >BJbvI Bsl I >SimX Bgl XX 

I MM I M I Ml MM 

ATCCTGCATATAAGCAGCTGCTTTTTGCCTGTACTGGGTCTCTCTGGTTAGACCAGATCT 4 8 0 
TAGGACGTATATTCGTCGACGAAAAACGGACATGACCCAGAGAGACCAATCTGGTCTAGA 

I * MM - I M I Ml ' -11 IM 

425 434 448 456 475 

434 450 457 475 

434 451 458 476 

434 451 476 

435 453 476 

435 453 476 

436 478 
436 479 
437 479 
437 479 
437 
437 

Cvi JI 
Alu X 
Sac X 

ECOXCRX 
BsplZBSX 
Bsi HKAI 

Sty D4 1 MwoX 
ScrFX MseX CacBX 

PspGX SmlX CviJX 

Bst NI >TspRX >MnlX Alu I 

Bsa JI Bfax <BtsX CviJX HindXXX 

CviJX Ban XX CviJX NlaXV AflXX >Fal X 

II II I II I II I I II I I 

GAGCCTGGGAGCTCTCTGGCTAACTAGGGAACCCACTGCTTAAGCCTCAATAAAGCTTGC 54 0 
CTCGGACCCTCGAGAGACCGATTGATCCCTTGGGTGACGAATTCGGAGTTATTTCGAACG 

II II I • I I ' I II I I - Ml I- 

482 489 498 508 519 533 

484 504 514 523 533 

484 514 525 534 

484 519 534 

484 520 535 

484 539 
489 
489 
489 
489 
490 
490 



HXB2 -> Restriction Map 5/02/07 16:22:16 Page 5 

Dde I 
>Bsp CNI 
>Bse Mil 

Mbo I 
Dpn I 

<Plel Chal 

KMlyl BstKTX 
>Sth 1321 Hi/3 f I <Alwl 

Smll Bsp 12 8 61 Tsp 4 51 a I >WnI I 

>BpuEI Bine 15801 Waelll Waelll BstYJ 

I II II I I II II 

CTTGAGTGCTTCAAGTAGTGTGTGCCCGTCTGTTGTGTGACTCTGGTAACTAGAGATCCC 6 0 0 
GAACTCACGAAGTTCATCACACACGGGCAGACAACACACTGAGACCATTGATCTCTAGGG 

I 

541 
541 



1 1 


1 I- 


1 1 II 


1 1 


562 


577 


586 594 




562 


577 


590 


599 


565 


579 


595 






579 


595 






579 


595 








595 








595 










600 








600 








600 



Hxn PlI 



Hha I 


Vpa Kl 


SfoJ 


Unb I 


Nla IV 


Sau 96 


Narl 


Psp 03 


Lpn I 


Fmu I 


Kas I 


Ava II 


Hae II 


Pss I 


Bsa HI 


Ppu MI 


Bbel 


Nla IV 



>rspRI >5th 1321 

<Siml >BtsX BslX Eco O109X 

Hpy 18 81 >TspRX BfaX Ban X >BsmFZ 

II I I II II I II 

TCAGACCCTTTTAGTCAGTGTGGAAAATCTCTAGCAGTGGCGCCCGAACAGGGACCTGAA 6 6 0 
AGTCTGGGAAAATCAGTCACACCTTTTAGAGATCGTCACCGCGGGCTTGTCCCTGGACTT 

I I • I • -I II II I 'II 

601 616 631 639 651 

604 634 643 651 

635 643 

639 651 

639. 651 

639 651 

639 652 

639 652 

639 652 

639 652 

639 652 

640 652 
640 
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Cvx JI 
Alul 

Sad HinPll 

Eco XCRl Sell 

Bspl286I BstVl 

Bsi HKAI Mwol 

BanlZ HinPll 
<Mnll >Hgal CacBl Hhal >Bcefl 

>BseRT Hpy 9 91 Cvi Jl Cac 81 

>BplX Tag I <PieI Bss HI I 

>BplX EsaBCSl <Mlyl >Eco51Vil >Bce AI 

>CjePI Bpy 188III Bin f I >Acu I Hhal 

I I II I III I II I III I 

AGCGAAAGGGAAACCAGAGGAGCTCTCTCGACGCAGGACTCGGCTTGCTGAAGCGCGCAC 72 0 
TCGCTTTCCCTTTGGTCTCCTCGAGAGAGCTGCGTCCTGAGCCGAACGACTTCGCGCGTG 

I I II I III 1*11 I ' Ml \' 

674 686 697 708 715 

677 688 697 708 719 

677 688 697 713 

677 689 702 713 

677 690 703 713 719 

680 713 

680 713 

680 714 

680 714 

680 715 
681 
681 

Taul 

Fi3u4HI Rsal 

BthQl >5sp D5I CvlJl 

<Mnll >Bsrl Csp SI Tsp 5091 <Aci I Bfal 

<Mnll <Acil >Hphl Apo 1 Bfal <Mnll 

III III II I I I II 

GGC AAGAGGCGAGGGGCGGCGACTGGTGAGTACGCCAAAAATTTTGACTAGCGGAGGCTA 7 8 0 
CCGTTCTCCGCTCCCCGCCGCTGACCACTCATGCGGTTTTTAAAACTGATCGCCTCCGAT 

I 'I I'll I II I 'I I I I ' 

726 736 . 745 . 759 768 774 

731 742 750 760 771 778 

736 745 776 

736 750 

736 

<Cdi I 
Tag I 
Esa BC3I 
Cla I 
Mbo I 

<Sth 1321 Dpn I 

<Fau I Cha I 

>Cst MI <Acil Bst KTI Xmn 1 

<BpyAV <Bcc I <Hga I Msel Tsp 5091 <BccI 

II I I III I I II M I 

GAAGGAGAGAGATGGGTGCGAGAGCGTC AGTATTAAGCGGGGGAGAATTAGATCGATGGG 8 4 0 
CTTCCTCTCTCTACCCACGCTCTCGCAGTCATAATTCGCCCCCTCTTAATCTAGCTACCC 

II -I -I -III- I *lllll I 

781 791 804 813 826 835 

782 817 831 840 

817 831 
818 831 
831 
832 
833 
833 
834 
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5ty D4I 

Scr Fl 

PspGX 

BstUl 
HaeXlX 
Cvi JI 

Tsp 5091 Hael Msel 
Apol Msel BsaJX rsp509I Cac QX 

II I II I II I 

AAAAAATTCGGTTAAGGCCAGGGGGAAAGAAAAAATATAAATTAAAACATATAGTATGGG 9 00 
TTTTTTAAGCCAATTCCGGTCCCCCTTTCTTTTTTATATTTAATTTTGTATATCATACCC 

II • I II I • • II • I 

844 852 858 880 900 

845 855 882 

856 
856 
858 
858 
858 
858 

Hae III 
Cvl JI 
Sty D4I 

Scr FX SfcX 
CviJX PspGX Cvi J I 

Aiu I Tfl X Bst NI <Hpy AV 

>rthlllll BfaX HintX MseX HaeX Hpy 1881 

I II I I I II I I I I 

CAAGCAGGGAGCTAGAACGATTCGCAGTTAATCCTGGCCTGTTAGAAACATCAGAAGGCT 9 60 
GTTCGTCCCTCGATCTTGCTAAGCGTCAATTAGGACCGGACAATCTTTGTAGTCTTCCGA 

I II I- I ' I II • -I I II- 

901 912 919 928 935 951 

910 919 933 954 

910 933 957 

933 959 
933 

936 
936 

<JE:co5 7MI >MboXX 
<ACU X Mbo I 

<CjeI >HpyAV DpnX Mbo X 

<CjePX <StsX ChaX DpnX 

>Bsp 24X >BsmFX <FokX Bst KTX ChaX 

HpyBX >BsrX CviJX <Bst F5X >AlwX BstKTX 

AccX >BmrX Alu X >Bcc X Hpy 18 81 Hpy 1 8 8 1 Dde X 

I II .11 I II II I II I I I 1 

GTAGACAAATACTGGGACAGCTACAACCATCCCTTCAGACAGGATCAGAAGAACTTAGAT 102 0 
CATCTGTTTATGACCCTGTCGATGTTGGTAGGGAAGTCTGTCCTAGTCTTCTTGAATCTA 

I II -I I I- II • II I • II I I • I I • 

961 971 979 987 995 1005 1014 

961 971 979 988 1002 1018 

964 974 988 1003 1018 

964 988 1003 1018 

965 992 1003 1018 

993 1003 
993 1008 
>Sfa NI 
>BscAX 

HpyCH4III >Afn2 I HpyCH4V 

I I II 

CATTATATAATACAGTAGCAACCCTCTATTGTGTGCATCAAAGGATAGAGATAAAAGACA 10 80 
GTAATATATTATGTCATCGTTGGGAGATAACACACGTAGTTTCCTATCTCTATTTTCTGT 
. I . I • II • 

1032 1043 1054 

1055 
1055 
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<Sapl 

CviJX >Mbo Zl 

Sty I HlndllT <Barl Cac SI 

Bsa Jl Alul <Mnll Mwol >Tthlllll 

I II II I II 

CCAAGGAAGCTTTAGACAAGATAGAGGAAGAGCAAAACAAAAGTAAGAAAAAAGCACAGC 114 0 
GGTTCCTTCGAAATCTGTTCTATCTCCTTCTCGTTTTGTTTTCATTCTTTTTTCGTGTCG 

III- • I I • • -III 

1081 1088 1104 1134 1140 

1081 1087 1107 1139 

1088 1107 

1107 

Cvi Jl 
Alu I 
Pvu II 
MspAll 
Tse 1 
Fnu 4HI 
Bth CI 
>Bbvl 
AlwVll 
Tsel 
Fnu Am 

BthCl Tsp 5091 >Bsgl 

>Bbvl Cvi Jl Sfcl Hpy CH4V 

II III III II 

AAGCAGCAGCTGACACAGGACACAGCAATCAGGTCAGCCAAAATTACCCTATAGTGCAGA 12 0 0 
TTCGTCGTCGACTGTGTCCTGTGTCGTTAGTCCAGTCGGTTTTAATGGGATATCACGTCT 

II III • • • I • I I- II • 

1143 1176 1189 1195 

1143 1182 1194 

1143 
1143 
1144 
1146 
1146 
1146 
1146 
1147 
1147 
1148 
1148 
StyD4I 

ScrFl NIa III 

PspGI Fat I 

Bst NI PpulOI 

Bsa Jl Nsi I 

<Stsl HaelXX BfaX BfrBX 

<FokX RsaX CviJX <SspD5X MseX Cvi All 

<Bst F5I Csp 61 Hael <Hph X DraX Hpy CH4V 

II I II II II II I 

ACATCCAGGGGCAAATGGTACATCAGGCCATATC ACCTAGAACTTTAAATGCATGGGTAA 12 6 0 
TGTAGGTCCCCGTTTACCATGTAGTCCGGTATAGTGGATCTTGAAATTTACGTACCCATT 

II- I • II • I I • II II I 

1202 1218 1225 1233 1244 1250 

1202 1218 1226 1233 1245 1252 

1202 1226 1237 1249 

1205 1249 

1205 1249 

1205 1252 

1205 1252 

1205 



-> 
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NlalXX 

Xmn I Fat I 

<HpyAV CviAll 
>Mbo II <Ba e I <^py AV 

<EarZ Cvi Jl CvlJl <Bae I HpylBBX 

III I II II 

AAGTAGTAGAAGAGAAGGCTTTCAGCCCAGAAGTGATACCCATGTTTTC AGCATTATC AG 13 2 0 
TTCATCATCTTCTCTTCCGAAAGTCGGGTCTTCACTATGGGTACAAAAGTCGTAATAGTC 

I- I I • I • I -I • II 

1269 1277 1284 1295 1317 

1269 1295 1320 

1274 1301 

1274 1301 

1301 

Cvi JI 
Tse 1 
Fnu 4HX 

Cvi Jl Nia III >BsmFl Bth CI 

NlalV Msel Fatl >TspRX >BbvX 

>CstMX >RleAX DraX Cvi All HpyCH4III >rthlllll 

I I I I II I II I III 

AAGGAGCC ACCCCACAAGATTTAAACACCATGCTAAAC ACAGTGGGGGGACATCAAGC AG 13 8 0 

TTCCTCGGTGGGGTGTTCTAAATTTGTGGTACGATTTGTGTCACCCCCCTGTAGTTCGTC 

III 'I II h II I • I I h 

1321 1331 1340 1349 1359 1374 

1323 1341 1349 1360 1377 

1325 1349 1367 1377 

1377 
1377 
1379 
<Bsr X 

HpyCH4V <StsX 
Sfc X <Fok I 

Tse I <Bst F5I 

Fi3U 4HI >SfaNI 
Hpy CH4V Bth CI Mwo X 

NlalXX <BbvX Bst API 

Msl I >BCC I Cvi Jl >Bsc AI 

Fatl <BsinAI Alul Apa BI 

Cvi All Msel <BsaX <Mnl X PstZ Hpy CH4V 

II I II I I I II I I I I 

CCATGCAAATGTTAAAAGAGACCATCAATGAGGAAGCTGCAGAATGGGATAGAGTGCATC 14 4 0 
GGTACGTTTACAATTTTCTCTGGTAGTTACTCCTTCGACGTCTTACCCTATCTCACGTAG 

II -I 1*1 I MM- * I II I 

1382 1392 1398 1410 1417 1435 

1382 1398 1415 1436 

1382 1402 1415 1436 

1382 1416 1436 

1384 1416 1436 

1416 1436 

1416 1437 



1417 1437 
1418 1437 



1440 
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Hpy CH4V 
Nl a 111 

Fatx HaeXlZ Hae 111 

CviAlX CvlJl 
Sph I unb I Hae I 

Nspl Sau 961 5ty D4I 

Mwo 1 Pssl Scr Fl 

CacSl Fmul Psp Gl 

ffpy CH4V CvlJl Hpy CH4V Sty I Tsp 451 

>TspRl £0001091 BstNI Bsa JI Afae III 

I III I III I I II I I 

CAGTGCATGCAGGGCCTATTGCACCAGGCCAGATGAGAGAACCAAGGGGAAGTGACATAG 15 0 0 
GTCACGTACGTCCCGGATAACGTGGTCCGGTCTACTCTCTTGGTTCCCCTTCACTGTATC 

J III I ■III I I II • -I -I 

1441 1451 1464 1482 1492 

1482 1492 





1451 


1464 


444 


1453 


1460 


1445 


1452 


1464 


1445 


1451 


1464 


1445 


1452 


1464 


1445 


1452 


1466 


1446 




1467 


1446 


1453 


1467 


1446 






1448 








<Bco 5 7MI 




Rsa I 





>Sts I 
>Fok I 
>BstT5I 
<Bcc I 

Bfal <Acul >Stsl >H±n^l 

Spel >HpyAV >FokX >Hae IV <Bsrl 

>PsrZ CspSX Hpy 188III >Bst F5I <Bmrl 

I II I II I II I I II 

CAGGAACTACTAGTACCCTTCAGGAACAAATAGGATGGATGACAAATAATCCACCTATCC 15 6 0 
GTCCTTGATGATCATGGGAAGTCCTTGTTTATCCTACCTACTGTTTATTAGGTGGATAGG 

I II I II I * II I 'I • II 

1504 1513 1520 1533 1559 

1509 1517 1533 1541 1560 

1510 1518 1533 1541 

1513 1534 

1518 1537 

1537 
1537 
5ty D4I 
5crFI 

Psil >H±nAX Psp Gl 

Tsp 5091 >HaelV BstNI 
Apo I <Bcc X Bsa JI Mse x 

III I I . I 

CAGTAGGAGAAATTTATAAAAGATGGATAATCCTGGGATTAAATAAAATAGTAAGAATGT 162 0 
GTCATCCTCTTTAAATATTTTCTACCTATTAGGACCCTAATTTATTTTATCATTCTTACA 

II I • I -I h 

1570 1582 1592 1599 

1571 1582 1592 

1574 1582 1592 

1592 
1592 
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Vpa KllAI 
Unb I 
Sau 961 

Psp 031 

MwoX <Bsml FmuX >Fal 1 Hpy BX 

CviJX BstXX Hpy 188III Avail NlalV <BsiaAX Acc X 

W III I II I.I 

ATAGCCCTACCAGCATTCTGGACATAAGACAAGGACCAAAGGAACCCTTTAGAGACTATG 
TATCGGGATGGTCGTAAGACCTGTATTCTGTTCCTGGTTTCCTTGGGAAATCTCTGATAC 



1680 



II III* 
1623 1630 1637 
1624 1633 



Hp a II 
BsrFX 
Bsa WI 
Age I 



Dde I 



Cvx JI 
Mwo I 
I I 



1653 
1653 
1653 
1653 
1653 
1653 
Cvi JI 
Cac 81 
Alu I 
Hln dlll 



I • I 

1661 
1659 



1672 



I 



1680 
1680 



I II 



<Mnl X 
I 



>Sts X 
>FokX 
>Bst F5I 
rsp 5091 



I I 



TAGACCGGTTCTATAAAACTCTAAGAGCCGAGCAAGCTTCACAGGAGGTAAAAAATTGGA 
ATCTGGCCAAGATATTTTGAGATTCTCGGCTCGTTCGAAGTGTCCTCCATTTTTTAACCT 



1740 



II 

1684 
1684 
1684 
1685 



1701 



Vpa KllAI 
Unb X 
Sau 961 
Psp 031 
Fmu X 
Ava XX 



1707 
1706 



I II 
1714 
1715 

1712 

1715 



I 

1725 



I I 



1734 



1738 
1738 
1738 



Mse X 
Dra X 

I II 
TGACAGAAACCTTGTTGGTCCAAAATGCGAACCCAGATTGTAAGACTATTTTAAAAGCAT 
ACTGTCTTTGGAACAACCAGGTTTTACGCTTGGGTCTAACATTCTGATAAAATTTTCGTA 

. • I • • • II 

1757 1790 

1757 1791 

1757 

1757 

1757 

1757 



1800 
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HpaXl 
StyD4I 
>Sth 1321 
ScrFX 
Ncx I 
SCO HI 
<Sinil 
Vpa KllAI 

Tau I Unb I 

MspUlT 5au96I 

Vpa KllAI Psp 0 31 

UnhX Fnu4HI Nia III Nia IV 

Sau96I Fat I Fmu I Hae III 

Psp 031 Cvi All Avail 

F/nu I Bth CI Nspl Pss I >Gdi II 

Avail >ifi/3 41 PpuMI Cvi J I 

NlalV <Acil >MboXX >HaeXV Fco 01091 

>BsmFX Cvi JI B£a I Ahd I <Mnl 1 EaeX 

I I I I I II III I I I I I I I I 

TGGGACC AGCGGCTACACTAGAAGAAATGATGACAGCATGTCAGGGAGTAGGAGGACCCG 18 6 0 
ACCCTGGTCGCCGATGTGATCTTCTTTACTACTGTCGTACAGTCCCTCATCCTCCTGGGC 



1 1 


1 hi 


1 •! 


• 1 


1 1 • 


• llll II 


1 1 


1802 


1811 


1818 


1832 




1852 


1859 


1802 


1809 


1821 


1832 




1853 




1803 






1832 




1853 


1860 


1803 


1809 






1836 


1853 


1859 


1803 








1837 


1854 




1803 








1837 


1854 


1860 


1803 


1809 






1837 


1854 




1803 


1807 
1809 








1854 
1854 
1854 
1854 

1855 





1857 
1857 
1857 
1857 
1857 
1858 

>EcoSll\X Tsp 5091 

>Acu I Cvi JI Apo I Cvi JI 

Cvi JI >BsrDX Afaelll Alul Mslx 

III I llll I 

GCCATAAGGCAAGAGTTTTGGCTGAAGC AATGAGCC AAGTAACAAATTCAGCTACC ATAA 19 2 0 
CGGTATTCCGTTCTCAAAACCGACTTCGTTACTCGGTTCATTGTTTAAGTCGATGGTATT 

II I • I I- II I I • 

1880 1887 1899 1910 1916 

1882 1893 1904 1910 

1882 1905 

HpyCVL^V 

<Sfa'^X rsp509I rsp509I 

<BscKX <MnlX NlaXV MseX MfeX 

II I I I I II 

TGATGCAGAGAGGCAATTTTAGGAACCAAAGAAAGATTGTTAAGTGTTTCAATTGTGGCA 19 8 0 
ACTACGTCTCTCCGTTAAAATCCTTGGTTTCTTTCTAACAATTCACAAAGTTAACACCGT 

II I I • I • I II 

1922 1930 1942 1960 1970 

1922 1935 1971 

1924 
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Unb I 

Sau 961 

Nla IV 

Hae III 

Fmu I 

Cvl JI 
Unb I 
Sau 961 
Pss I 

Psp OMI 

Wla IV 
Fjnu I 
JsrcoOl09i 
Bspl286I 
Bme 15 8 01 
Ban II 
Apal BfaZ 
PssX Styl 
BsplZBSl ECOO1091 

B/ne 15801 Hpy CH4V Bsa JI <Mmel 

<HpyAV CviJT TspSOSX AvrZZ Cvi JI 

II I I I I II I I II 

AAGAAGGGCACACAGCCAGAAATTGCAGGGCCCCTAGGAAAAAGGGCTGTTGGAAATGTG 2 04 0 
TTCTTCCCGTGTGTCGGTCTTTAACGTCCCGGGGATCCTTTTTCCCGACAACCTTTACAC 

I I • I •! I llh II • II- 

1983 1994 2001 2013 2025 

1986 2004 2013 2029 

1986 2007 

2007 2013 
2008 2014 
2008 
2008 
2008 
2008 
2008 
2008 
2008 
2008 
2008 
2008 

2009 

2009 

2009 

2009 

2009 

2009 

Mbo I 

Dde I Dpn I 

<Bsp CNI Cha I 

<BseMII Bst KTI 

RsaZ Bst Yl 

Csp SI Cvi JI BglTX 

<HpyKV >TspDTl Tat Z <BsmAZ Tsp509Z >Mbo Zl 

I I II I I I I III 

GAAAGGAAGGACACCAAATGAAAGATTGTACTGAGAGACAGGCTAATTTTTTAGGGAAGA 210 0 
CTTTCCTTCCTGTGGTTTACTTTCTAACATGACTCTCTGTCCGATTAAAAAATCCCTTCT 

I • I • II •! I -I I • I !!• 

2046 2058 2067 2075 2085 2096 

2068 2081 2098 

2068 2098 

2071 2099 

2071 2099 

2071 2099 

2099 



HXB2 -> Restriction Map 5/02/07 16:22:16 Page 14 

StyD4I 
ScrFl 
Psp Gl 
Bstm 

Bsa Jl Hpy IQBZ 

>HpyAV Hae III <Eco 57M1 

Hae III CvlJI <AcuX 

Cvi Jl HaeX Tsp 5 0 91 

Hae I <HpyAV Apo 1 <Mbo 11 Cvi 31 Cvi Jl 

III I I I I II III I f 

TCTGGCCTTCCTAC AAGGGAAGGCC AGGGAATTTTCTTCAGAGC AGACC AGAGCCAAC AG 216 0 
AGACCGGAAGGATGTTCCCTTCCGGTCCCTTAAAAGAAGTCTCGTCTGGTCTCGGTTGTC 

III- hill II III- -I I- 

2103 2119 2129 2135 2152 2159 

2104 2121 2130 

2104 2122 2136 

2106 2122 2136 

2124 2138 

2124 

2124 

2124 

2124 

Hpy 18 81 

Cv± Jl >Bsa XI Dde I 

Alul >BsaXl >flsp CNI 

>Mbo 11 <Sco 5 7MI >Hi22 4I >Bse Mil 

<Ear 1 <Acu 1 OsmAl >Mnl I Nla IV 

III III ill I 

CCCCACCAGAAGAGAGCTTCAGGTCTGGGGTAGAGACAACAACTCCCCCTC AGAAGCAGG 22 2 0 
GGGGTGGTCTTCTCTCGAAGTCCAGACCCCATCTCTGTTGTTGAGGGGGAGTCTTCGTCC 

I- I I • - III • III I- 

2169 2177 2193 2208 2219 

2169 2177 2195 2209 

2175 2196 2209 

2175 2196 2209 



2210 



Tsp 4 51 
Mae 111 



Dde 1 

>Bsp CNI <Siinl 

Msel >Bse Mil >Hin 41 

>Bci VI >Mnl 1 >Hae IV 

Cvi Jl Hpy CH4III Bsu 361 Ahd I 

I I I I II I 1 
AGCCGATAGACAAGGAACTGTATCCTTTAACTTCCCTCAGGTCACTCTTTGGCAACGACC 2 2 8 0 
TCGGCTATCTGTTCCTTGACATAGGAAATTGAAGGGAGTCCAGTGAGAAACCGTTGCTGG 

I • II 1 • II -I • I • 

2221 2237 2255 2277 

2240 2255 2277 

2247 2256 2277 

2256 2277 



2256 



22 61 
2261 



rsp45I 

Waelll Cvi Jl 

>Mnll Alul 
I I I 

CCTCGTCACAATAAAGATAGGGGGGCAACTAAAGGAAGCTCTATTAGATACAGGAGCAGA 2 34 0 
GGAGCAGTGTTATTTCTATCCCCCCGTTGATTTCCTTCGAGATAATCTATGTCCTCGTCT 

II- • - I • 

2281 2317 

2285 2317 
2285 
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>Mbo II 
5tyD4I 
Scr FI 
PspQX 

>MboTX BstXZ 
Hpy CH4III Bstm <BccZ 

II III 
TGATACAGTATTAGAAGAAATGAGTTTGCCA6GAAGATGGAAACC AAAAATGATAGGGGG 2 4 0 0 
ACTATGTCATAATCTTCTTTACTCAAACGGTCCTTCTACCTTTGGTTTTTACTATCCCCC 
I . I . I- I I • 

2345 2369 2376 

2354 2369 
2369 
2369 
2369 

2373 
Hpy 18 81 
Mbo I 
Dpn I 
Cha I 

<Mnll Est KTl 

<Hin 41 Bell 
rsp509I Hpy CH4III Hpy 81 

II I II I I 

AATTGGAGGTTTTATCAAAGTAAGAC AGTATGATCAGATACTCATAGAAATCTGTGGACA 2 4 6 0 
TTAACCTCCAAAATAGTTTCATTCTGTCATACTAGTCTATGAGTATCTTTAGACACCTGT 

II- • I -11 I • -I 

2401 2425 2454 

2406 2431 
2406 2432 

2432 
2432 
2432 
2434 

l^paKllAI 

Unb I 

Sau 961 

Psp 031 

Fmu I 

Avail 
Sse8647I 
SfcT Hpy CH4III PssX 
CviJZ RsaX Ppu MI Hpy 8 1 >Mbo 11 

Alul CspSl EcoOlQ^l Hindi Tsp509l 

II II II I II 

TAAAGCTATAGGTAC AGTATTAGTAGGACCTACACCTGTCAACATAATTGGAAGAAATCT 2 5 2 0 
ATTTCGATATCCATGTCATAATCATCCTGGATGTGGACAGTTGTATTAACCTTCTTTAGA 

11-11 • II ' I • I •! 

2464 2472 2485 2498 2506 

2464 2472 2485 2498 2511 

2466 2474 2485 

2485 

2486 

2486 

2486 

2486 

2486 

2486 
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Hpy 18 81 
Dde I 
>Bsp CNI 
>BseMII 
<Cjfel 
<PleI 
<Mlyl 
Hin f I 

<Cje PI Apo I J?sa I 

>Bsp2 41 MseX Csp 61 

Hpy SI Oral ifpyCH4III 

Hindi Bpy CH4V Tsp 5091 Cvi JI <Bs2nAl <Bsrl 

I MM I MM I I I I I 

GTTGACTCAGATTGGTTGCACTTTAAATTTTCCCATTAGCCCTATTGAGACTGTACCAGT 2 580 
CAACTGAGTCTAACCAACGTGAAATTTAAAAGGGTAATCGGGATAACTCTGACATGGTCA 

I MM • I • M M • I • I I I I • 

2521 2537 2546 2558 2567 2576 

2521 2542 2570 

2524 2543 2573 

2524 2545 2573 

2524 
2524 
2524 
2525 
2526 
2526 
2526 
2527 

Unb I 
5au96I 
Haelll 
StyD4I Fjnul 

ScrFl Cvi Jl Hae III 

Psp GI <BccI Cvi JI 

Cvi JI >Stsl Mscl 

Msel Bst NI >Fokl Hae I 

Tsp 5091 >Bst F5I Msel Eael >Mboll 

Mil III I II I 

AAAATTAAAGCCAGGAATGGATGGCCCAAAAGTTAAACAATGGCCATTGACAGAAGAAAA 2 64 0 
TTTTAATTTCGGTCCTTACCTACCGGGTTTTCAATTTGTTACCGGTAACTGTCTTCTTTT 

I I Ml III -I Ml -I 

2583 2599 2613 2621 2633 

2585 2591 2599 2621 

2589 2599 2621 

2591 2600 2622 

2591 2603 2622 

2591 2603 
2603 
2603 
2603 

Rsa I 
Tat I 
BsrGI 

Tsp 5 0 91 Apo I 

Apo I Csp 61 <BccI <HpyAV Tsp 5091 Tsp 5091 

II II I I II I 

AATAAAAGCATTAGTAGAAATTTGTACAGAGATGGAAAAGGAAGGGAAAATTTCAAAAAT 2 7 0 0 
TTATTTTCGTAATCATCTTTAAACATGTCTCTACCTTTTCCTTCCCTTTTAAAGTTTTTA 

II- II M M II- I - 

2658 2664 2671 2681 2689 2698 

2659 2688 
2663 
2663 
2664 
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Haelll RsaX 

CvlJl Csp 61 

Unb I <Bsr I Tatl 

Sau 961 <Eco 57M1 Seal 

Fittul <BpmX HpyCH4lII 

II II III 

TGGGCCTGAAAATCCATACAATACTCCAGTATTTGCCATAAAGAAAAAAGACAGTACTAA 2 760 
ACCCGGACTTTTAGGTATGTTATGAGGTCATAAACGGTATTTCTTTTTTCTGTCATGATT 

II • • II • • -I II • 

2702 2724 2751 

2702 2724 2753 

2702 2726 2753 

2703 2754 

2703 2754 

Hpy 1 8 8 1 1 1 
Sml I 

rsp50 9I Hpy 18 81 Msel <Bpu El 

I I I I I 

ATGGAGAAAATTAGTAGATTTCAGAGAACTTAATAAGAGAACTCAAGACTTCTGGGAAGT 2 8 2 0 

TACCTCTTTTAATCATCTAAAGTCTCTTGAATTATTCTCTTGAGTTCTGAAGACCCTTCA 

h -I I • II • 

2769 2781 2790 2802 

2802 
2803 

>Sts I 
<Cje I 

>Aci I Rsa I 

>Sthl32I Csp 61 

>Fau I rat I >Fok I 

<Stsl Seal >Bst F5I 

<FoJtI Hpy CH4 1 II 

rsp509I <Bst F5I Msel Mae 111 >Bsrl 

I I II I I III I I 

TCAATTAGGAATACCACATCCCGCAGGGTTAAAAAAGAAAAAATCAGTAACAGTACTGGA 2 8 8 0 
AGTTAATCCTTATGGTGTAGGGCGTCCCAATTTTTTCTTTTTTAGTCATTGTCATGACCT 

I • III h - I I II I I ' 

2823 2837 2849 2867 2875 

2837 2870 
2837 2872 2878 

2840 2872 2878 

2840 2873 
2841 2873 

2875 

2878 

Hpy CH 4 V 
PpulOl 

Nsil <Eco 57M1 

<SfaNI . <Acu I 

<Bsc AI >Mboll Hpy 81 

>Ssp D51 >Bbsl Bstzni 

>HphX >Tsp DTI Hpy 18 811 1 

<HleAI Bfr HI Ddel >Bbrll Accl Hpy CH4V 

I I III I I I I I II 

TGTGGGTGATGCATATTTTTCAGTTCCCTTAGATGAAGACTTCAGGAAGTATACTGCATT 2 9 4 0 

ACACCCACTACGTATAAAAAGTCAAGGGAATCTACTTCTGAAGTCCTTCATATGACGTAA 

I I III • I • I I I I h I - 

2881 2889 2908 2915 2929 2935 

2885 2913 2922 

2885 2915 2929 

2888 2915 2929 

2888 2920 
2889 2920 
2889 
2890 
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>Hln 41 
StyD4I 
ScrFl 
PspGX 

Bst NI RsaX 
Bfal Bsa Jl EcoRV Csp 61 MslX 

>Cjel <BsmAT >HaeIv Tat I Bst XI 

II I I I I II II 

TACCATACCTAGTATAAACAATGAGACACCAGGGATTAGATATCAGTACAATGTGCTTCC 3 000 
ATGGTATGGATCATATTTGTTACTCTGTGGTCCCTAATCTATAGTCATGTTACACGAAGG 

II- • I I- I I- II • II 

2943 2963 2974 2985 2999 

2949 2969 2979 2986 3000 

2969 2986 

2969 



2969 
2969 



2974 



>CjePI 
<Ssp D5I 
<Hph I 
Mbo I 

<Bcc I Dpn I 
>Stsl ChaX NlaXXX 

>FokX BstKTX FatX Cvi JX 

>Bst F5X >AlwX SspX Cvi AXX Dde X 

II II I I I I II 

ACAGGGATGGAAAGGATCACCAGCAATATTCCAAAGTAGCATGACAAAAATCTTAGAGCC 3 0 60 
TGTCCCTACCTTTCCTAGTGGTCGTTATAAGGTTTCATCGTACTGTTTTTAGAATCTCGG 

II • II I I I • I -I I • 

3005 3014 3025 3040 3052 

3005 3015 3040 3057 

3005 3015 3040 

3006 3015 
3015 
3017 
3017 

3020 

>Sts I Mbo I 

>Fok I Dpn I 

>Sst F5I ChaX 
NlaXXX BstKTX 
>CjeX FatX Bst YI 

Hpy 188III Cvi All >AIvI 

II II II 

TTTTAGAAAACAAAATCCAGACATAGTTATCTATCAATACATGGATGATTTGTATGTAGG 312 0 
AAAATCTTTTGTTTTAGGTCTGTATCAATAGATAGTTATGTACCTACTAAACATACATCC 

II • • II- II 

3076 3100 3119 

3077 3100 3119 

3100 3120 

3103 3120 

3103 3120 

3103 3120 
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Dde I 
<Bsp CNI 
<Bse MI I 

CVX J I 

Tse I Aiul 
Fnu4HI <AceIII 
Dde I BthCZ <Mnll <Hl n 41 

Hpy 18 81 >BJbvI >Bse RI <BsinAI 

II I I II I I 

ATCTGACTTAGAAATAGGGCAGCATAGAACAAAAATAGAGGAGCTGAGACAACATCTGTT 318 0 
TAGACTGAATCTTTATCCCGTCGTATCTTGTTTTTATCTCCTCGACTCTGTTGTAGACAA 

II- I' • I -III I • 

3122 3139 3158 3166 

3127 3139 3158 3166 

3139 3161 
3139 3162 

3162 
3164 
3164 
3164 

>Sts I 

BstXl >Fokl 
>BsinFl >Alol >Bst F5I 

<WnI I Hpy 18 81 >Mnl 1 BslX <Bcc 1 

II I I I I I II 

GAGGTGGGGACTTACCACACCAGACAAAAAACATCAGAAAGAACCTCCATTCCTTTGGAT 3 24 0 
CTCCACCCCTGAATGGTGTGGTCTGTTTTTTGTAGTCTTTCTTGGAGGTAAGGAAACCTA 

I I • ' -I •! I I • I II • 

3181 3214 3224 3232 3238 

3187 3221 3237 

3227 3237 
3237 

<Sts I Cvi JI 

<Fok I Rsa I Tse 1 

<Bst F5I Csp 61 Sfcl Fnu 4HI 

>Bccl rati Mwol Bth CI 

>rspDTI Hpy 188III ifpyCH4lII <Bbv I 

I III I II II I I 

GGGTTATGAACTCCATCCTGATAAATGGACAGTACAGCCTATAGTGCTGCCAGAAAAAGA 3 30 0 

CCCAATACTTGAGGTAGGACTATTTACCTGTCATGTCGGATATCACGACGGTCTTTTTCT 

I • II I • I'll 11 h I • 

3246 3256 3269 3286 

3253 3271 3277 3286 

3254 3272 3279 3286 

3254 3272 3286 

3254 3276 

Cvi JI 
Alu I 

Pvull ffpyCH4III rsp509I 

MspAll rsp509l Hpy 1881 

II I II I 
CAGCTGGACTGTCAATGACATACAGAAGTTAGTGGGGAAATTGAATTGGGCAAGTC AGAT 3 3 6 0 

GTCGACCTGACAGTTACTGTATGTCTTCAATCACCCCTTTAACTTAACCCGTTCAGTCTA 

III- • • I- I • I • 

3301 3339 3355 

3301 3308 3344 

3302 

3302 
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sty D4I 
Scr Fl 
Psp GZ 
BstNI 

BsaJI <Mjii I 

Pas I Hpy 8 1 Dde I 

BsaJI MseZ Tsp 5091 <Cst Ml NlalV 

II I I I I I II 

TTACCCAGGGATTAAAGTAAGGCAATTATGTAAACTCCTTAGAGGAACCAAAGCACTAAC 3 4 2 0 
AATGGGTCCCTAATTTCATTCCGTTAATACATTTGAGGAATCTCCTTGGTTTCGTGATTG 

II - I • I I I I • I I • 

3364 3372 3384 3395 3404 

3364 3390 3398 

3365 3402 

3365 

3365 

3365 

3365 

Bfa I 

CviJX Tfll 
>MboXX AluT >Bsrl Hint! 

I II I I 

AGAAGTAATACCACTAACAGAAGAAGCAGAGCTAGAACTGGCAGAAAACAGAGAGATTCT 34 8 0 
TCTTCATTATGGTGATTGTCTTCTTCGTCTCGATCTTGACCGTCTTTTGTCTCTCTAAGA 

I I I I • ■ I • 

3440 3450 3457 3475 

3450 3475 
3452 




>Bcc I 
<Slm I Mse I 

I I I 

AAAAGAACCAGTACATGGAGTGTATTATGACCCATCAAAAGACTTAATAGCAGAAATACA 3 54 0 
TTTTCTTGGTCATGTACCTCACATAATACTGGGTAGTTTTCTGAATTATCGTCTTTATGT 

Mil- hi • I • 

3488 3494 3509 3524 

3488 3512 
3488 
3490 
3491 
3491 

3494 
3494 

Haelll DraJ 
Cvi Jl rsp5 09l CviJl 

HaeX ApoX Hpy IQSXXX MseX Hpy 1881 

II II I I II I 

GAAGCAGGGGCAAGGCCAATGGACATATCAAATTTATCAAGAGCCATTTAAAAATCTGAA 3 6 0 0 
CTTCGTCCCCGTTCCGGTTACCTGTATAGTTTAAATAGTTCTCGGTAAATTTTTAGACTT 

• II • II I • I II • I • 

3553 3570 3577 3588 3595 

3554 3571 3582 

3554 3587 
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Bspl286l 
Bjne 15801 
WlalV 

Ban I Msl I Mse I <Mnl I 

HpyCH4V <Mnll >RleAl Tsp 5091 

I I II I I III 

AACAGGAAAATATGCAAGAATGAGGGGTGCCCACACTAATGATGTAAAACAATTAACAGA 3 6 6 0 
TTGTCCTTTTATACGTTCTTACTCCCCACGGGTGTGATTACTACATTTTGTTAATTGTCT 

• I • I II . II • 'II I- 

3613 3622 3630 3651 

3626 3632 3653 3659 

3626 
3627 
3627 

Mse I 

Hpy CH4V <Plel DraX 

>TspRl <Mlyl Tsp 5 091 

>Bts I Bin fx Apol 

III I II II 

GGCAGTGCAAAAAATAACCACAGAAAGCATAGTAATATGGGGAAAGACTCCTAAATTTAA 3 72 0 
CCGTCACGTTTTTTATTGGTGTCTTTCGTATCATTATACCCCTTTCTGAGGATTTAAATT 

III- • • • I • I I I I • 

3662 3706 3713 

3663 3706 3714 

3666 3706 3716 

3717 
5tyD4I 
ScrFl 

<CjePT HpySX PspGX 
NlaXXX NlaXXX >CjePl 
Fat I Fat I CviJX 

Cvi All CvlAXX CacBX BstUX 

I I I I I I I I 

ACTGCCCATACAAAAGGAAACATGGGAAAC ATGGTGGAC AGAGTATTGGCAAGCCACCTG 37 8 0 
TGACGGGTATGTTTTCCTTTGTACCCTTTGTACCACCTGTCTCATAACCGTTCGGTGGAC 

•I III • I- I I I • 

3741 3750 3769 3777 

3741 3750 3772 

3741 3750 3774 

3746 3754 3777 

3777 
3777 

Rsa I 

DdeX Cspex 

<BspCNI NlaXV 

<BseHXX KpnX 
Hpy 188III BanX 
TflX DdeX Acc 6 51 

Bin fx MseX >MnlX Tsp 50 91 <BsrX 

III I II I II I 

GATTCCTGAGTGGGAGTTTGTTAATACCCCTCCCTTAGTGAAATTATGGTACCAGTTAGA 3 84 0 
CTAAGGACTCACCCTCAAACAATTATGGGGAGGGAATCACTTTAATACCATGGTCAATCT 

III- -I I- I -I II- I 

3781 3801. 3809 3822 3832 

3781 3814 .3828 

3784 3828 

3786 3828 

3786 3828 

3786 3829 

3829 
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Cvx JI 
Tse I 
Fiiu 4HI 
Bth CI 
>Bhv I 

Bsi I >HpyAV <SccI Alul <SSJnAI 

I I III I 

GAAAGAACCCATAGTAGGAGCAGAAACCTTCTATGTAGATGGGGCAGCTAACAGGGAGAC 39 00 
CTTTCTTGGGTATCATCCTCGTCTTTGGAAGATACATCTACCCCGTCGATTGTCCCTCTG 
I . . I . I • II • I • 

3848 3867 3878 3886 3896 

3884 
3884 
3884 
3884 
3886 

>/fjboII <5sp D5I 

>Bhs I <Hph I 

>Sbr7I Tsp 4 51 

Tsp 5091 Waelll <Wni I /laelll 

I III M 

TAAATTAGGAAAAGC AGGATATGTTACTAATAG AGG AAGAC AAAAAGTTGTCACCCTAAC 3 9 6 0 
ATTTAATCCTTTTCGTCCTATACAATGATTATCTCCTTCTGTTTTTCAACAGTGGGATTG 

I • • I . • I I • II 

3903 3923 3933 3950 

3936 3950 

3936 3951 

3936 3951 

Dde I 
<Bsp CNI 
<Bse MI I 

>J»fJboII Hpy 188III 

>Bbs I Cvl J I Tf i I 

>BJbr7I >rthlllll Bf a I Hpy CH4V <Stiil32I 

Hpy 1881 /faelll rsp509I AIu I Hint! 

II III I II I I II 

TGACACAACAAATC AGAAGACTGAGTTAC AAGC AATTTATCTAGCTTTGCAGGATTCGGG 4 0 2 0 
ACTGTGTTGTTTAGTCTTCTGACTCAATGTTCGTTAAATAGATCGAAACGTCCTAAGCCC 

• I I -I I I- I -11 I • I II • 

3973 3985 3994 4003 4013 

3976 3989 4001 4008 4017 

3976 4003 4013 

3976 4016 
3981 
3981 
3981 

<Plel Hpy CH4V 

<MlyX Ppu 101 

HintX Nsll Tfil 

Hpy 81 MaeXXZ BfrBX Bin fx >Tth llllX 

I II II I I 

ATTAGAAGTAAACATAGTAACAGACTCACAATATGCATTAGGAATCATTCAAGC ACAACC 40 80 
TAATCTTCATTTGTATCATTGTCTGAGTGTTATACGTAATCCTTAGTAAGTTCGTGTTGG 

I • I • I • II • I I 

4028 4037 4053 4062 4070 

4043 4053 4062 

4043 4053 

4043 4054 
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Mbo I 

Dpn I Hpy 18 81 

Cha I Tfi I 

Bst KTI Hi/If I Msel 

III I 
AGATCAAAGTGAATC AGAGTTAGTCAATCAAATAATAGAGCA6TTAATAAAAAAGGAAAA 414 0 
TCTAGTTTCACTTAGTCTCAATCAGTTAGTTTATTATCTCGTCAATTATTTTTTCCTTTT 

I -I I ■ • • I • 

4082 4091 4124 

4082 4091 
4082 4094 
4082 

Rsa I 
Csp 61 
NlalV 
Kpn I 
Ban I 
ACC65I 
Nia III 

Fat I <Mnl I 

Cvi All Tsp 5091 >rspDTI Tsp 5091 

I II III I 

GGTCTATCTGGCATGGGTACCAGCACACAAAGGAATTGGAGGAAATGAACAAGTAGATAA 42 0 0 
CCAGATAGACCGTACCCATGGTCGTGTGTTTCCTTAACCTCCTTTACTTGTTCATCTATT 

•III- • I I- I • I 

4152 4174 4185 4200 

4152 4179 
4152 

4156 
4156 
4156 
4156 

4157 

4157 

Rsa I Unb I 

Csp6I Sau96I 
Hpy 188III Haelll 
Tfi I rati F/nuI. 

>rspRI HinfX Seal <Bcc 1 CvlJX >rspDTI 

I I I II I II 

ATTAGTCAGTGCTGGAATCAGGAAAGTACTATTTTTAGATGGAATAGATAAGGCCCAAGA 42 6 0 
TAATCAGTCACGACCTTAGTCCTTTCATGATAAAAATCTACCTTATCTATTCCGGGTTCT 

I • I I • II • I ' -I I 

4207 4215 4225 4238 4252 4260 

4215 4225 4252 

4218 4252 

4226 4252 

4226 4252 

Wlalll Tsp 5091 BfaZ 

Fatl Hpy CH4III Cvi JI >BspMI SfcJ 

Cvi All <CjeI >BsrDX Msel >Cjel 

I II III I I I I 

TGAACATGAGAAATATCACAGTAATTGGAGAGCAATGGCTAGTGATTTTAACCTGCCACC 4 3 2 0 
ACTTGTACTCTTTATAGTGTCATTAACCTCTCGTTACCGATCACTAAAATTGGACGGTGG 

I • I • I •III- I -I I I 

4265 4278 4292 4308 4316 

4265 4278 4297 4311 4320 

4265 4283 4299 
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ffpy CH4V 
Ppu 101 

Cvi JI Nsi I 

Alul BfrBl 
PvulX NlalXl 
Msp All Fat I 

Cac 81 Cvi JX Cvi J I 

Cvi JI Aiul >Cst MI Cvi All 

II II I I I III 

TGTAGTAGCAAAAGAAATAGTAGCCAGCTGTGATAAATGTCAGCTAAAAGGAGAAGCCAT 4 3 8 0 
ACATCATCGTTTTCTTTATCATCGGTCGACACTATTTACAGTCGATTTTCCTCTTCGGTA 

• II II • -I I I III 

4342 4362 4368 4378 

4343 4362 4375 

4345 4378 
4345 4378 
4346 4379 
4346 4379 

4379 
4380 

5ty D4I 
ScrFX 

SfcX PspGX RsaX 
Hpy CH4III Hpy 81 

Nla III PsbAl BstXX CspSX 

Fat I Hpy8I Bst NI Tat I 

Cvi All Acci Pi'ol BfaX BsrGX <HpyAV 

I I III II I II I 

GC ATGGACAAGTAGACTGTAGTCCAGGAATATGGCAACTAGATTGTACACATTTAGAAGG 44 4 0 
CGTACCTGTTCATCTGACATCAGGTCCTTATACCGTTGATCTAACATGTGTAAATCTTCC 

I -I III • II • I • II - . I • 

4382 4391 4402 4418 4424 4436 

4382 4391 4403 4424 

4382 4394 4403 4425 

4395 4425 
4396 4403 4425 
4403 
4403 

Cvi JI 
StyD4l Nlalll 
ScrFX FatX 
PspGX Cvi All >TspRX 

BstUX <Tsp DTI <BsrX XmnX 

I I I I II I 

AAAAGTTATCCTGGTAGCAGTTC ATGTAGCCAGTGGATATATAGAAGCAGAAGTTATTCC 4 5 0 0 
TTTTCAATAGGACCATCGTCAAGTACATCGGTCACCTATATATCTTCGTCTTCAATAAGG 

I -I I I II • I 

4450 4461 4470 4490 

4450 4463 4471 

4450 4463 
4450 4463 

4468 
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Hae III 
Cvi JI 
Msc I 
Hae I 

rsp 50 91 EaeZ 
Mse I <Bcc I 

I?ra I >Wi>o 1 1 <Bsr I 

II I I Mil 

AGCAGAAACAGGGCAGGAAACAGCATATTTTCTTTTAAAATTAGC AGGAAGATGGCCAGT 4 5 6 0 

TCGTCTTTGTCCCGTCCTTTGTCGTATAAAAGAAAATTTTAATCGTCCTTCTACCGGTCA ' 

II I • I • I I I 1 

4534 4548 4556 

4535 4551 

4539 4553 
4553 
4553 
4554 
4554 
Bsl I 
>AC± I 
Tau I 

HpalX F23U4HI 
BsrFl BthCI 
Bsa wi Hae III 

rsp 50 91 Cvi JI 

Tse I Age I C^nb I 

Fnu4Hl SgrAX Sau 961 

Bth CI <SspD5I Fmu I 

<Cjel >BbvT <HphX HpyCH4III Bsl I 

I I I III! I llll I 

AAAAACAATACATACTGACAATGGCAGCAATTTCACCGGTGCTACGGTTAGGGCCGCCTG 46 20 
TTTTTGTTATGTATGACTGTTACCGTCGTTAAAGTGGCCACGATGCCAATCCCGGCGGAC 

I • I h llll • I •MM I • 

4574 4584 4593 4604 4617 

4584 4593 4611 

4584 4594 4611 

4584 4595 4611 

4589 4612 
4595 4612 
4595 4613 
4596 4613 
4613 
4614 
4614 

Tfi I 

Hlnfl Tsp 5091 

<Sth 1321 Tsp 5091 

<Fau I Apo I EcoRZ 

<AciZ >Tth lllXX Apol >Cst MI 

M I I II M I 

TTGGTGGGCGGGAATCAAGCAGGAATTTGGAATTCCCTACAATCCCCAAAGTCAAGGAGT 4 68 0 
AACCACCCGCCCTTAGTTCGTCCTTAAACCTTAAGGGATGTTAGGGGTTTCAGTTCCTCA 

11*11-11 II •* -I 

4628 4636 4650 4674 

4628 4643 4650 

4629 4644 

4632 4651 

4632 
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Cvl JI 

Mho I 
Dpn I 

Tfxl >rspDTI Msel ChaT 

Hlnfl Tsp 5091 Tsp509I BstKTI 

II III II 

AGTAGAATCTATGAATAAAGAATTAAAGAAAATTATAGGACAGGTAAGAGATCAGGCTGA 
TCATCTTAGATACTTATTTCTTAATTTCTTTTAATATCCTGTCCATTCTCTAGTCCGACT 



4740 



I 

4685 
4685 



4691 



I I 
4701 
4703 



4711 



I 



4730 
4730 
4730 
4730 



I 



4735 



Mse 1 
Sml I 
Afl II 
I I 



Rsa I 
Csp 61 

Tat 1 

I I 



<Sts I 

<Fokl Msel 
<Bst F51 Oral 
<TspDTl rsp 509_I — 



ACATCTTAAGACAGCAGTACAAATGGCAGTATTCATCCACAi]^PTTTAAAAGAAAAGGGGG 
TGTAGAATTCTGTCGTCATGTTTACCGTCATAAGTAGGTGTTfVAAATTTTCTTTTCCCCC 



II ' 
4745 
4745 
4746 



I 



TT 



4800 



4756 
4757 
4757 



1i 

4781 

4784 
4785 



• I I 
4772 
4774 
4774 
4774 

>Bsgr I 
>rsp RI 
Hpy CH4III 
RsaZ Hpy CH4V 
Csp 61 

3 11111 I 
TTGGGGGGTACAGTGCAGGGGAAAGAATAGTAGACATAATAGCAACAGACATACAAAC 
AACCCCCCATGTCACGTCCCCTTTCTTATCATCTGTATTATCGTTGTCTGTATGTTTG 
I II II * -I 



Hpy 81 
Acc I 



4860 



4810 
4810 



4816 



4832 
4832 



4812 
4813 
4815 



Tsp 5091 
I 



\^Apo I A 



Tsp 5 091 



SAATTAckM 
:TTAATGpT'I 

4866 (U::. 



Tsp 50 91 
Apo I 

Tsp 5 091 <Sth 1321 



>BsmFl. 



TAAAGAATTAC^AAAACAAATTACAAAAATTCAAAATTTTdGGGTTTATTACAGGGACAG 
ATTTCTTAATGpTTTTGTTTAATGTTTTTAAGTTTTAAAAqCCCAAATAATGTCCCTGTC 

h II • II -iT 



4920 



4879 




4914 



>Cje PI 
Vpa KllAI 
C/jiJb I 

Sau96I >Mnn. 
Psp 031 <Bse RI >SspD5l 

Fmu I Cvi JI >Bp/i I 

Avail Alul Hpy 188III <HpyAV 

II I I I I II 

CAGAAATCCACTTTGGAAAGGACCAGCAAAGCTCCTCTGGAAAGGTGAAGGGGCAGTAGT 
GTCTTTAGGTGAAACCTTTCCTGGTCGTTTCGAGGAGACCTTTCCACTTCCCCGTCATCA 



4980 



I I 
4940 
4940 
4940 
4940 
4940 
4940 

4943 



I I I I 
4950 4956 
4950 
4952 
4954 



4967 
4964 
4964 
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Mbo I 
Dpn I 

Tsp 4 51 ChaX 
MaeXll >Mbo 11 BstKTl 

I I I 

AATACAAGATAATAGTGAC ATAAAAGTAGTGCCAAGAAGAAAAGC AAAGATC ATTAGGGA 50 4 0 

TTATGTTCTATTATCACTGTATTTTCATCACGGTTCTTCTTTTCGTTTCTAGTAATCCCT 

I • • I • I- 

4995 5016 5029 

4995 5029 

5029 

5029 

>5sp D5I <Mnll 
<Bsp MI >Sts 1 

<Bcc I >Hph I Hpy 81 >Fok I 

<CjePX <Aarl Acc 1 >Bst FSl 

I I I I III 

TTATGGAAAACAGATGGCAGGTGATGATTGTGTGGC AAGTAGAC AGGATGAGGATTAGAA 510 0 
AATACCTTTTGTCTACCGTCCACTACTAACACACCGTTCATCTGTCCTACTCCTAATCTT 

1*111 • h I I 

5046 5057 5079 5086 

5053 5060 5079 5086 

5057 5086 

5060 5090 

>Sts I 

NlalXl Bfal <Bccl 

Fatl Ndel Cvi JT >FokX 

Cvi All Ms 11 Alul >Bst F5X 

I I I I II 

CATGGAAAAGTTTAGTAAAACACCATATGTATGTTTCAGGGAAAGCTAGGGGATGGTTTT 516 0 
GTACCTTTTCAAATCATTTTGTGGTATACATACAAAGTCCCTTTCGATCCCCTACCAAAA 

I • • I • • I I 'I I 

5101 5124 5144 5151 

5101 5124 5144 5151 

5101 5146 5152 

5151 
<Sts X 

<Sts I Rsa I 

<Fok X Hpy 8 1 

>Mnl I Csp 61 Bsl I 

Cvi JI XmnX Tat 1 <Fok 1 BfaX 

MslX >Tsp DTI <Bst F5X Hpy 1881 <Bst F5I 

I I I I I 1 I II I I I 

ATAGACATCACTATGAAAGCCCTC ATCC AAGAATAAGTTC AGAAGTAC AC ATCCCACTAG 5 2 2 0 
TATCTGTAGTGATACTTTCGGGAGTAGGTTCTTATTCAAGTCTTCATGTGTAGGGTGATC 

I • I I -I I -I I- II I I I • 

5166 5173 5184 5199 5210 

5178 5191 5204 5210 5217 

5181 5205 5213 

5184 5205 
5184 5205 

5210 

Bfa X 
<SfaHX 
<BSCAX 

>Sts I >Bsr X 

>FokX Hpy CH 4 V <BsinAI 

>Bst F5I >SlmX <CjePI 

II I II III 

GGGATGCTAGATTGGTAATAACAACATATTGGGGTCTGCATACAGGAGAAAGAGACTGGC 52 8 0 

CCCTACGATCTAACCATTATTGTTGTATAACCCCAGACGTATGTCCTCTTTCTCTGACCG 

III- ' -11- I • I I • 

5222 5252 5268 

5222 5257 5272 

5222 5275 

5223 

5223 

5227 
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PflMX 
BSI I 
>BSinAl 

>PleX <Siml 
>Mlyl Hpy SZ 

>Slml Hinfl KMnlX Acc 1 

I I I I I II 

ATTTGGGTCAGGGAGTCTCCATAGAATGGAGGAAAAAGAGATATAGCACACAAGTAGACC 5 3 4 0 
TAAACCCAGTCCCTCAGAGGTATCTTACCTCCTTTTTCTCTATATCGTGTGTTCATCTGG 

I • II !• I- • • I I • 

5285 5293 5309 5334 

5293 5334 
5293 5337 
5295 

5299 
5299 

<Ple I 
<Mly I 
AlwHl 

<rspDTI Hpy 18 81 

Bfal rsp 5091 Hpy CH4III Hinfl 

I II I III 

CTGAACTAGCAGACCAACTAATTCATCTGTATTACTTTGACTGTTTTTCAGACTCTGCTA 5 4 00 
GACTTGATCGTCTGGTTGATTAAGTAGACATAATGAAACTGACAAAAAGTCTGAGACGAT 

I • II • I ll-l 

5346 5360 5380 5391 

5362 5388 

5389 
5391 
5391 

Bfa I 

Hae III Sty I 

Cvl JI Bsa JI 

Stui Avrll 

Hae I CvlJX >Tth 111X1 

II . I II I 

TAAGAAAGGCCTTATTAGGACACATAGTTAGCCCTAGGTGTGAATATCAAGCAGGACATA 54 6 0 
ATTCTTTCCGGAATAATCCTGTGTATCAATCGGGATCCACACTTATAGTTCGTCCTGTAT 

II • • . Ill - I • 

5407 5430 5448 

5407 5433 

5408 5433 

5408 5433 

5434 

Mbo I 

Dpn I Tse X Mse I 

Cha I Fnu 4HI 

Bst KTI Bti3 CI 

Bst YI BfaX AseX 

>AlwX <Cje X Mwo X >Bbv I 

II I III II- 

ACAAGGTAGGATCTCTACAATACTTGGCACTAGCAGC ATTAATAAC ACC AAAAAAGATAA 5 5 2 0 
TGTTCCATCCTAGAGATGTTATGAACCGTGATCGTCGTAATTATTGTGGTTTTTTCTATT 

II I • I I I II- 

5469 5477 5487 5493 

5469 5490 5498 

5470 5493 

5470 5493 

5470 5493 5499 

5470 
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>Mbo II 

Mwol MaeTll <Mnll <Bcc X >Bbs 1 

Cvi Jl Bfal KCjePX CvlJl >BbrlZ 

II II I I I I 

AGCCACCTTTGCCTAGTGTTACGAAACTGAC AGAGGATAGATGGAACAAGCCCCAGAAGA 5 5 8 0 

TCGGTGGAAACGGATCACAATGCTTTGACTGTCTCCTATCTACCTTGTTCGGGGTCTTCT 

M • I I • • I I I- I • 

5521 5533 5553 5569 5576 

5522 5538 5553 5560 5576 

5576 

Hae III 

Cvi JI Sml I 

UnhT Aflll 

Sau96I Cvi JI 

Fmu I Cvi JI Cvi JI Al u I 

Styx NlalV <CjeI Alul <MnlZ Mse 1 

BsaJT <MnlT Xcml >Tsp DTI Bfal >BseRl 

I II 1 II I 1 I II ill! 

CCAAGGGCCACAGAGGGAGCCACACAATGAATGGACACTAGAGCTTTTAGAGGAGCTTAA 564 0 

GGTTCCCGGTGTCTCCCTCGGTGTGTTACTTACCTGTGATCTCGAAAATCTCCTCGAATT 

111*111111' 1*1 I III - 

5581 5593 5600 5607 5618 5630 

5581 5596 5604 5622 5630 5637 

5585 5598 5622 5634 

5585 5634 
5585 5636 
5586 5636 
5586 

Nla III 

Fat I 

Cvi All 
Sty I 
Ncol 

Bfal BtffZ Ddel 

Cvi JI Sty I Bsa JI 

Alul Bsa JI Nla IV <Bpu 101 

>TspDTl Avrll Cvi JI Cvi JI Msll >TspDTX 

II II I II 111 I I 

GAATGAAGCTGTTAGACATTTTCCTAGGATTTGGCTCCATGGCTTAGGGCAACATATCTA 5 7 0 0 
CTTACTTCGACAATCTGTAAAAGGATCCTAAACCGAGGTACCGAATCCCGTTGTATAGAT 

I I * - II • I II -111 - I I 

5643 5663 5673 5681 5693 5700 

5647 5663 5673 5682 

5647 5663 5677 

5664 5677 5683 

5677 
5677 
5678 
5678 
5678 

Mwo I 
Bst API 
rsp 509l 
EcoRl Apa Bl 
<BcxVl Cvi JI Apo I Hpy CH4V 

I I II II 

TGAAACTTATGGGGATACTTGGGCAGGAGTGGAAGCCATAATAAGAATTCTGCAACAACT 57 6 0 
ACTTTGAATACCCCTATGAACCCGTCCTCACCTTCGGTATTATTCTTAAGACGTTGTTGA 

. I . . I • M -il 

5713 5734 5745 5751 

5745 5752 
5746 

5752 
5752 



4 
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Tag I 
Sal I 
Hpy 8 1 

Hindi Tag I <Mnll 

TspSOST EsaBC3l EsaBCSX 
Hpy 1881 Acci Wvol Waelll >Bse RI 

II II I III 

GCTGTTTATCCATTTTCAGAATTGGGTGTCGAC ATAGCAGAATAGGCGTTACTCGAC AGA 58 2 0 
CGACAAATAGGTAAAAGTCTTAACCCACAGCTGTATCGTCTTATCCGCAATGAGCTGTCT 

II Ih I • I. • I I- 

5776 5788 5797 5808 5819 

5780 5789 5813 

5788 5813 5819 
5788 
5788 
5789 

StyDAl 

StyD4I ScrFI 

ScrFI PspGJ 

MboX PspGT BstUl 

Dpnl Bstm Pfol 

Cha I Mwo I <Sts I 

Bst KTI CvlJl <Fokl 

KBsrX Bfal Bsp 12861 <Bst F5 1 

CvlJI <AlwI Ban II >S/aNI 

wlaiv Bst Yl Bfal Bsa JI >Bsc Al 

III II I I Mill II II 

GGAGAGCAAGAAATGGAGCCAGTAGATCCTAGACTAGAGCCCTGGAAGC ATCCAGGAAGT 5 8 8 0 
CCTCTCGTTCTTTACCTCGGTCATCTAGGATCTGATCTCGGGACCTTCGTAGGTCCTTCA 

I I h II h I Mill il'll 

5835 5844 5854 5860 5868 

5837 5845 5857 5868 

5839 5849 5857 5869 

5845 5858 5869 

5845 5859 5869 

5845 5861 5871 

5845 5861 5872 

5861 5872 

5861 5872 
5872 

Tsp 5091 
Mfe I 

Rsa I <Bsr DI 

Csp 61 <Tsp DTI 
Cvi JI <Tth mil >Fai I MwoZ 

I I I II I I I I 

CAGCCTAAAACTGCTTGTACCAATTGCTATTGTAAAAAGTGTTGCTTTCATTGCCAAGTT 59 4 0 
GTCGGATTTTGACGAACATGGTTAACGATAACATTTTTCACAACGAAAGTAACGGTTCAA 

I • I I -11 • I • I II- 

5882 5892 5917 5924 

5897 5927 
5897 5929 
5901 
5902 

>SfaNI <SapX 
Dde X Eco NI Mwo X >Mbo X X 

Bsu 3SX BslX <AciX Hpy 991 

<rsp DTI CviJX >BSCAX >MboXX <BsmAX <Ear X 

I I II I I III II 

TGTTTCATAACAAAAGCCTTAGGCATCTCCTATGGCAGGAAGAAGCGGAGACAGCGACGA 60 0 0 
ACAAAGTATTGTTTTCGGAATCCGTAGAGGATACCGTCCTTCTTCGCCTCTGTCGCTGCT 

I • I II • I I- I- I I • I I- 

5944 5955 5963 5979 5988 5999 

5957 5969 5985 5995 

5958 5969 5985 5999 

5963 5999 



4. 
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Nla III 
Fat I 

Cvi Jl Cvi All 

AluX . Pcil 
SacZ <Plel NspX 

ECO ICRl <MlyX AfllXT 

Bsp 12861 HinfX Cvi JI RsaX 

BsiHKAX Hpy CH4III Alu X Csp SX 

BanXX Hpyl88X Hpy ISQX HindXXX TatX 

II I I I I II nil 

AGAGCTCATCAGAAC AGTC AGACTCATCAAGCTTCTCTATCAAAGCAGTAAGTAGTACAT 6 0 6 0 
TCTCGAGTAGTCTTGTCAGTCTGAGTAGTTCGAAGAGATAGTTTCGTCATTCATCATGTA 

II I- I I •! II • • II II • 

6002 6009 6018 6029 6054 

6002 6014 6030 6055 

6002 6021 6030 6055 

6002 6021 6057 

6002 6021 6057 

6003 6057 
6003 6058 

6058 
6058 

Mae XXX MwoX 
I I 

6TAACGCAACCTATACCAATAGTAGCAATAGTAGCATTAGTAGTAGCAATAATAATAGC A .6120 
CATTGCGTTGGATATGGTTATCATCGTTATCATCGTAATCATCATCGTTATTATTATCGT 
I . . I . . . . 

6061 6085 
Vpa KllAI 
Unb I 
Sau 961 
Psp 031 

Fmu I Mse I 

Ava XX Ssp I 

I I I 

ATAGTTGTGTGGTCCATAGTAATCATAGAATATAGGAAAATATTAAGACAAAGAAAAATA 618 0 
TATCAACACACCAGGTATCATTAGTATCTTATATCCTTTTATAATTCTGTTTCTTTTTAT 
,| . . h I • • 

6131 6159 

6131 6i63 

6131 

6131 

6131 

6131 

>TspRX 
HpyCH4III 
>Mbo 1 1 

rsp509l >BbsX >CstHX 

MseX >Bbr 7 1 >Bsr DI <Spy AV 

II I II I II 

GACAGGTTAATTGATAGACTAATAGAAAGAGCAGAAGAC AGTGGC AATGAGAGTGAAGGA 6 2 4 0 
CTGTCCAATTAACTATCTGATTATCTTTCTCGTCTTCTGTCACCGTTACTCTCACTTCCT 

II- • • I II- I • II • 

6187 6214 6224 6235 

6189 6214 6236 

6214 

6218 
6219 
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NlaXlZ 

Fat I >Stsl 
NlalV >Fokl 
Banz Styx 
Bspl286l BsaJl 
Bme 158 01 <Cst MI 
<BccT <BccJ Cvi All >Sst F5I 

I I II I I I I 

GAAATATCAGCACTTGTGGAGATGGGGGTGGAGATGGGGCACCATGCTCCTTGGGATGTT 6 3 00 
CTTTATAGTCGTGAACACCTCTACCCCCACCTCTACCCCGTGGTACGAGGAACCCTACAA 

•I -I II • I I h I • 

6261 6273 6283 6294 

6277 6287 
6277 6289 
6278 6289 
6278 6294 
6283 6294 
6283 

Rsa I 
Csp 61 
Nla IV 

Sfcl Tsp 451 Kpnl 

Mbol Mae XXX BanX 

DpnX KRleAX <BaeX 

ChaX <CjeFX >SimX <BaeX 

Bst KTI Sfcl Tsp 5091 Hpy CH4III Acc 651 

I I I I I I II I II 

GATGATCTGTAGTGCTACAGAAAAATTGTGGGTCAC AGTCTATTATGGGGTACCTGTGTG 6 3 60 
CTACTAGACATCACGATGTCTTTTTAACACCCAGTGTCAGATAATACCCCATGGACACAC 

I I * I I I I I I I • II 

6304 6315 6324 6335 6349 

6304 6320 6330 6349 

6304 6327 6349 

6304 6332 6349 

6307 6332 6349 

6349 
6350 
6350 

KSfaUX 
Hpy 1881 
>SfaNI 

Mwo I <BSC AI 
Bst API 

>Bsc AI <Bae I -Rsa I 

Apa BI <Bae I Csp 6 1 

<BpyAV Hpy CH4V NdeX <MnlX 

I II I I I I I I 

GAAGGAAGCAACCACCACTCTATTTTGTGCATCAGATGCTAAAGCATATGATACAGAGGT 6 4 2 0 
CTTCCTTCGTTGGTGGTGAGATAAAACACGTAGTCTACGATTTCGTATACTATGTCTCCA 

I • • II' I I • I I II- 

6361 6388 6405 6416 

6389 6410 6419 

6389 6410 6419 

6389 

6389 6395 
6389 

6392 

6395 
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Mwo I 
Hae III 

CvlJl Wla III 
UnbZ Ms 11 >RleAX 
Sau96I Fat I J?sa I 

FmuX Cvi All Csp6l >CjeX 
<TthlllXX NspX Hpy 81 <SimX >J?ieAI 

I III II III I II 

ACATAATGTTTGGGCCACACATGCCTGTGTACCCACAGACCCCAACCCACAAGAAGTAGT 648 0 

TGTATTACAAACCCGGTGTGTACGGACACATGGGTGTCTGGGGTTGGGTGTTCTTCATCA 

I • I II I I I I • I I • II- 

6427 6439 6447 6458 6466 

6432 6440 6449 6467 

6432 6440 6449 

6432 6440 6452 

6433 6440 
6433 
6434 

Nla III 

Wla III Fat I 

Fat I CviAII 
Cvi All Hpy CH4V 

Pci I Ppu 101 

NspX Wsi I 

MseX Nla XXX BfrBX 

Tsp 451 Tsp 5091 Fat I <SfaNX 

Mae XXX ApoX AflXXX CviAXX <Bsc AX 

I II I I I I I II I 

ATTGGTAAATGTGAC AGAAAATTTTAACATGTGGAAAAATGACATGGTAGAAC AGATGC A 654 0 
TAACCATTTACACTGTCTTTTAAAATTGTACACCTTTTTACTGTACCATCTTGTCTACGT 

•I Mill- -I • II I I • 

6491 6499 6507 6523 6535 

6491 6500 6523 6535 

6504 6523 6536 
6507 6536 
6507 6536 
6508 6537 
6508 6539 
6508 6539 



6539 



Mbo X 

DpnX NlaXXX 



ChaX FatX 

Bst KTI Cvi All MseX >CjeX 

<MnlX >AlwX CviJX CvlJX Tsp 5091 Dra III 

I II I I I I I II 

TGAGGATATAATCAGTTTATGGGATCAAAGCCTAAAGCCATGTGTAAAATTAACCCCACT 6 6 0 0 
ACTCCTATATTAGTCAAATACCCTAGTTTCGGATTTCGGTACACATTTTAATTGGGGTGA 

I • •III- I h II II • 

6542 6562 6569 6576 6588 6597 

6563 6579 6590 6596 

6563 6579 
6563 6579 
6563 
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>rspRi 

Hpy Bl 
Bspl286I 
Bsi HKAI 

Bine 15801 <Sth 1321 

Msel Hpy CH4V <Fau I 

Dra I Apa LI >Mbo 11 >Cjel <AciX 

II II I I I II 

CTGTGTTAGTTTAAAGTGCACTGATTTGAAGAATGATACTAATACCAATAGTAGTAGCGG 6 6 6 0 
GACACAATCAAATTTCACGTGACTAAACTTCTTACTATGATTATGGTTATCATCATCGCC 

II II h I • ■ I • II • 

6610 6616 6628 6645 6657 

6611 6617 6657 
6616 6658 
6616 
6616 
6616 

6619 

<CjePI >Csthll >TthllllX 

I I I 

GAGAATGATAATGGAGAAAGGAGAGATAAAAAACTGCTCTTTCAATATCAGCACAAGCAT 6 72 0 
CTCTTACTATTACCTCTTTCCTCTCTATTTTTTGACGAGAAAGTTATAGTCGTGTTCGTA 

I . I . . . . I . 

6663 6678 6714 

Hpy CH4V 
Ppu 101 
Hpy CH 4 V Nsil 
<Mnll >BsgX BfrBl PsiX 

I II II I 

AAGAGGTAAGGTGCAGAAAGAATATGCATTTTTtTATAAACTTGATATAATACC AATAGA 6 7 8 0 
TTCTCCATTCCACGTCTTTCTTATACGTAAAAAAATATTTGAACTATATTATGGTTATCT 

I •!! • II • I • 

6723 6731 6744 6754 

6732 6744 
6744 



6745 



Dde I 

>BspCNI Hae III 



>BseMII CviJX 
CviJX Hpy 8 1 >MnlX StuX 

AluX HincXX Mae XXX Hae X 

I I III II 

TAATGATACTACCAGCTATAAGTTGACAAGTTGTAACACCTCAGTCATTACACAGGCCTG 6 8 4 0 
ATTACTATGATGGTCGATATTCAACTGTTCAACATTGTGGAGTCAGTAATGTGTCCGGAC 

I • I • I II • II • 

6794 6802 6813 6834 

6794 6802 6819 6834 

6820 6835 
6820 6835 
6820 
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Mwo I 
Cvi JI 
Hpa II 
StyD4I 
>Sth 1321 
Scr FI 
Nci I 
f:co HI 
Bsa JI 

>BciVI Tsp509I Bsp 12861 i I 

>Fal I Cvi JI Wsi I Sine 15 8 01 Hin fl 

II II I I III II I 

TCCAAAGGTATCCTTTGAGCCAATTCCCATACATTATTGTGCCCCGGCTGGTTTTGCGAT 6 9 0 0 

AGGTTTCCATAGGAAACTCGGTTAAGGGTATGTAATAACACGGGGCCGACCAAAACGCTA 

II- I • I • I h 111 II • I • 

6845 6858 6872 6879 6898 

. 6848 6862 6879 6898 

6882 
6883 
6883 
6883 
6883 
6883 
6884 
6886 
6887 

Rsa I 
Nlaxil 
Fat I 
Cvi All 
Vpa KllAI 
Tai I Unb I Tat I 

Hpy CH4IV Sau96I RsaT 

<CjeX Psp 031 Csp 61 

<CjePI Fmul Bsr Gl Tat 1 

>Bsp 241 Avail Csp 61 Hpy CH4III 

II I I II III 

TCTAAAATGTAATAATAAGACGTTCAATGGAACAGGACCATGTACAAATGTCAGCACAGT 6 9 6 0 
AGATTTTACATTATTATTCTGCAAGTTACCTTGTCCTGGTACATGTTTACAGTCGTGTCA 

II • I I'll • I II- 

6919 6935 6942 6956 

6919 6935 6941 6958 

6920 6935 6959 

6920 6935 6959 

6920 6935 6941 

6935 

6939 
6939 
6939 

6942 

Nla III 
Fat I 

Rsa I <Bsr I 

Hpy 8 1 Hae III 

Csp 61 Cvi J I 

rati Cvi All Hae I 

Bsr GI Tsp 5091 Afse I Bfal 

II I I II I I I 

ACAATGTACACATGGAATTAGGCCAGTAGTATCAACTCAACTGCTGTTAAATGGCAGTCT 7 02 0 

TGTTACATGTGTACCTTAATCCGGTCATCATAGTTGAGTTGACGACAATTTACCGTCAGA 

II -I I II I • • I • I- 

6965 6976 7007 7019 

6965 6971 6980 
6966 6981 
6966 6981 
6966 6983 
6971 
6971 
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Mbo I 
Dpn I 
Cha I 

<Mnl I Bst KTI 

>MbolZ BstYX 

<Earl BglXX >TspGVIX 

>MboXX Tsp 5Q9X Tsp 509X 

III I II I I 

AGCAGAAGAAGAGGTAGTAATTAGATCTGTCAATTTCACGGACAATGCTAAAACCATAAT 7 08 0 
TCGTCTTCTTCTCCATCATTAATCTAGACAGTTAAAGTGCCTGTTACGATTTTGGTATTA 

I I -I I'll • I I • 

7025 7039 7052 

7028 7043 7058 

7028 7043 
7031 7044 
7044 
7044 
7044 

Cvi Jl 

AluX RsaX 
PvuXX TatX 
MspAlX Tsp509X 
RsaX MseX CspSX 

CspSX AseX BsrGX 

TatX SfcX Tsp 5091 <SlmX 

I I II I I I II II I 

AGTACAGCTGAACACATCTGTAGAAATTAATTGTACAAGACCC AACAACAATACAAGAAA 714 0 
TCATGTCGACTTGTGTAGACATCTTTAATTAACATGTTCTGGGTTGTTGTTATGTTCTTT 

MM- I • I II I • I I I • 

7081 7098 7105 7119 

7082 7106 7112 

7082 7107 7113 

7085 7109 
7085 7112 
7086 7113 
7086 

Sty D4I 

ScrFX 

Psp GI 

BstNI 
Vpa KllAI 
Unb X 
Sau 961 
Psp 031 
Fmu I 

<rspGWI Avail 
Tfi X >Bci VI <Mnl I 

HintX Hpy 188III BsaJX MaeXXX 

I I I I III I 

AAGAATCCGTATCC AGAGAGGACCAGGGAGAGCATTTGTTAC AATAGGAAAAATAGGAAA 7 2 0 0 
TTCTTAGGCATAGGTCTCTCCTGGTCCCTCTCGTAAACAATGTTATCCTTTTTATCCTTT 

I I I- I III • I • 

7143 7152 7163 7178 

7143 7149 7158 
7146 7160 
7160 
7160 
7160 
7160 
7160 

7163 
7163 
7163 
7163 
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>rth mil Msel 
<BsmAl MaelXl DraX 

III II 

TATGAG AC AAGC AC ATTGTAAC ATTAGTAGAGCAAAATGGAATAACACTTTAAAACAGAT 72 6 0 
ATACTCTGTTCGTGTAACATTGTAATCATCTCGTTTTACCTTATTGTGAAATTTTGTCTA 

II- I • • • H 

7204 7218 7249 

7208 7250 

Hpy 188III 

Bfa I Dde I 

NheX >Bsp CNI 

Cac 81 >BseMII 

BmtX >Mnll 

Cvi Jl Msel Bsu36I 

Alul rsp509I rsp509I Msel Bsl I 

III I I I I III 

AGCTAGCAAATTAAGAGAACAATTTGGAAATAATAAAACAATAATCTTTAAGCAATCCTC 7 3 2 0 
TCGATCGTTTAATTCTCTTGTTAAACCTTTATTATTTTGTTATTAGAAATTCGTTAGGAG 

III l-l •! • • I • III- 

7261 7269 7281 7308 7317 

7261 7271 7317 

7262 7317 
7262 7318 
7262 7318 
7263 7318 

7319 

<Sim I 

Vpa KllAI 

Unbl 

Sau 961 

Psp 031 

Nla IV 

Fmu I 

Avail 
San DI 
Pss I 
PpuMI 
Nla IV 

£coO109I rsp509I rsp509I 

>BSinFI Tsp 5091 Hpy CH4III Apo I 

<MnlX >CjeX Mae III Msel <MnI I HpyCH4III 

I III I I I I II III I 

AGGAGGGGACCC AGAAATTGTAACGC ACAGTTTTAATTGTGGAGGGGAATTTTTCTACTG 7 3 8 0 
TCCTCCCCTGGGTCTTTAACATTGCGTGTCAAAATTAACACCTCCCCTTAAAAAGATGAC 

I III -I II I - I I - I II • I - 

7323 7331 7340 7353 7362 7377 

7326 7336 7347 7367 

7326 7355 7368 

7326 
7326 
7326 
7326 

7327 

7327 

7327 

7327 

7327 

7327 

7327 
7328 
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>£co 5 7 MI 
>Acu I 

Tat I Rsa I Rsa I 

Seal rati Csp SI 

Msel Rsal ScaX rati >Sini I 

rsp 5091 Hpy CH4III Csp 61 i>lse I Csp 61 Seal <Hpy AV 

I I I II I II II II I 

TAATTCAACACAACTGTTTAATAGTACTTGGTTTAATAGTACTTGGAGTACTGAAGGGTC 7 4 4 0 
ATTAAGTTGTGTTGACAAATTATCATGAACCAAATTATCATGAACCTCATGACTTCCCAG 

I • I I. • II -I II- II -I I I • 

7382 7393 7404 7413 7419 7427 7433 

7398 7404 7418 7427 7436 

7403 7418 7428 

7403 7419 7428 

7431 
7431 

<HpyAV Nlalll Pcil 

>Eco blMl >MnlT Hpy CH4V Nspl 

>Acul rsp 451 <Ssp D5I Fat I Psi I Afi III 

>rspRI Mae 111 <Hphl Cvi All Tsp 50 91 

III I III! Ml 

AAATAACACTGAAGGAAGTGACACAATCACCCTCCCATGCAGAATAAAACAAATTATAAA 75 0 0 
TTTATTGTGACTTCCTTCACTGTGTTAGTGGGAGGGTACGTCTTATTTTGTTTAATATTT 

I hi I • 1-1 II- /II I 

7447 7458 7467 7476 7492 

7449 7458 7467 7476 7494 7500 

7449 7471 7478 7500 

7451 7476 7500 
Mwo I 

Nia III fist API >Bcc'L 

Fatl >BsrDl Bsll HpyBl BsaBZ 

Cvi ATI ApaBI >Mnll >TspRl Tsp 5091 

I I I II II I I 

CATGTGGCAGAAAGTAGGAAAAGCAATGTATGCCCCTCCCATCAGTGGACAAATTAGATG 7 5 6 0 
GTACACCGTCTTTCATCCTTTTCGTTACATACGGGGAGGGTAGTCACCTGTTTAATCTAC 

I • • I • I II- I I • I I • 

7501 7523 7535 7543 7552 

7501 7523 7538 7545 7557 

7501 7523 7539 

7523 

Tse I 

Fnu4HI Hpy 1881 

Bth CI >PleI 
Sspl <Bbvl <Cjel >Mlyl 

<TspDTl Cvi Jl Msel <Bccl Hin tl 

II II I I I II 
TTCATCAAATATTACAGGGCTGCTATTAACAAGAGATGGTGGTAATAGCAACAATGAGTC 7 6 2 0 
AAGTAGTTTATAATGTCCCGACGATAATTGTTCTCTACCACCATTATCGTTGTTACTCAG 

I I • II- I I- I - • I I- 

7561 7578 7586 7595 7616 

7568 7579 7589 7616 

7579 7616 

7579 7619 

7579 
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Hpy 1881 <Mnll 
<Eco S7MZ >BseRI 
<ACUI >Bco 5 7MI 
<WJboII >Bpml 
Mbol Sty D4 I 

Dpn I ScrFl 
Chal PspGl 

Sst KTI EcoUl MfeX 
Bst Yl Bst NI <Mnll >BsmFX 

Bglll BslX >Bse RI <Mnl 1 Tsp509l Tsp 5091 

II II I II i I 11 II I 

CGAGATCTTCAGACCTGGAGGAGGAGATATGAGGGACAATTGGAGAAGTGAATTATATAA 7 6 8 0 

GCTCTAGAAGTCTGGACCTCCTCCTCTATACTCCCTGTTAACCTCTTCACTTAATATATT 

llllh II I -I -I I II • -I 

7623 7634 7641 7651 7658 7671 

7623 7634 7641 7653 

7624 7634 7657 

7624 7634 
7624 7634 
7624 7634 
7626 7635 
7627 7635 
7627 7638 
7629 7638 

>Cje I Sty I >Mbo II 

Tsp 5091 Bsa JI <Earl 

II II 

ATATAAAGTAGTAAAAATTGAACCATTAGGAGTAGCACCCACCAAGGCAAAGAGAAGAGT 7 7 4 0 
TATATTTCATCATTTTTAACTTGGTAATCCTCATCGTGGGTGGTTCCGTTTCTCTTCTCA 
I . I . . I , I . 

7696 7722 7734 

7703 7722 7734 

Tse I 
Fnu 4HI 

Hpy CH4V >rsp RI CvlJZ Sty I Bth CT 

>Bsgl >Btsl AluX Bsa JI >Bbv Z 

M M II I 

GGTGCAGAGAGAAAAAAGAGCAGTGGGAATAGGAGCTTTGTTCCTTGGGTTCTTGGGAGC 7 8 0 0 
CCACGTCTCTCTTTTTTCTCGTCACCCTTATCCTCGAAACAAGGAACCCAAGAACCCTCG 

II • II • I • I • I- 

7742 7760 7774 7783 7799 

7743 7761 7774 7783 7799 

7799 
7799 

<Hgra I 

Tse I Hae III 

FiJu4HI Cvi JZ 

Bth CZ Hae I 

>Bbv I Rsa I 

MwoZ Hin PlI Csp 61 

<CjePZ HhaZ >HgaI Hpy CH4 1 II rsp50 9I 

I I II I I I I II I 

AGCAGGAAGCACTATGGGCGCAGCGTCAATGACGCTGACGGTACAGGCCAGACAATTATT 7 8 6 0 
TCGTCCTTCGTGATACCCGCGTCGCAGTTACTGCGACTGCCATGTCCGGTCTGTTAATAA 

II- III •! I -I II • I 

7806 7818 7831 7838 7854 

7809 7818 7841 

7820 7841 
7820 7845 
7820 7846 
7820 7846 
7823 
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Tse I 
Fnu 4H1 
BthCX 

>Bbvl <MnlX 
Tse I Dde I Mwo I >Sfa NI 

Fnu 4HI <Bsp CNI Wvo I 

Bth CI <BseMII Bst API 

>Bi>vI <BpulOI HinPlI >BscAI 

HpyCH4V <Bbv CI Hiia I Apa BI 

>BsgI Tsp 5091 Cvi JI <MnI I AlvNI 

I I I I I I I I I I II M 

GTCTGGTATAGTGCAGCAGCAGAACAATTTGCTGAGGGCTATTGAGGCGCAACAGCATCT 7 92 0 
CAGACCATATCACGTCGTCGTCTTGTTAAACGACTCCCGATAACTCCGCGTTGTCGTAGA 

-Mil'- I •Mill- 11-11 

7871 7886 7897 7904 7913 

7872 7891 7907 7915 

7873 7891 7907 7915 

7873 7892 7915 

7873 7892 7915 

7873 7892 7898 7915 

7876 7894 

7876 

7876 

7876 

Sty D4 1 

Scr FI 

Psp GI 
<Eco 57MI 
<Bpm I 
Cvi JI 

Tse I Bst NI Sty D4I 

Fnu 4HI SCJT FI 

BtiiCI PspGI 
>BbvI Bst NI 

>SfaHX AluX BslX 
HpyCHAXXX >TthlllXX TfiX 

Hpy CH4V >Bsc AI >Ace III HlntX Cvi JX 

I I I I III I I III 

GTTGCAACTCACAGTCTGGGGCATCAAGCAGCTCCAGGCAAGAATCCTGGCTGTGGAAAG 7 980 
CAACGTTGAGTGTCAGACCCCGTAGTTCGTCGAGGTCCGTTCTTAGGACCGACACCTTTC 

I -I 'I I III I I • I I I- 

7923 7941 7949 7962 7969 

7931 7945 7962 

7941 7950 7966 

7948 7966 
7948 7966 
7948 7966 
7948 7954 7966 
7950 
7952 
7952 
7954 
7954 
7954 
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Sty D4I 
ScrFZ 
PspGX 
Bstmx 

<Cje I 
Mbol CvlJl 
. Dpn X Alu X 
ChaX AlvNI 
BstKTX BsaJX 
>AlwX >AceXXX Hpy 188III Hpy CH4V 

II III I I I 

ATACCTAAAGGATCAACAGCTCCTGGGGATTTGGGGTTGCTCTGGAAAACTCATTTGCAC 8 04 0 

TATGGATTTCCTAGTTGTCGAGGACCCCTAAACCCCAACGAGACCTTTTGAGTAAACGTG 

II III • I • -I • I • 

7990 7997 8021 8036 

7991 8002 
7991 7997 
7991 7998 
7991 7998 
7996 

8002 
8002 
8002 
8002 

>TspRX >Hln^X 

Mslx >B8mX >HaeIV <Cje I 

<BtsX Styx KMzneX <CjePX TfiX 

Ale I BsaJX BfaX HpylBBXXX HlntX 

I I I I I ^ III II 

CACTGCTGTGCCTTGGAATGCTAGTTGGAGTAATAAATCTCTGGAACAGATTTGGAATCA 8100 

GTGACGACACGGAACCTTACGATCAACCTCATTATTTAGAGACCTTGTCTAAACCTTAGT 

1 -I I -I I • I I I- I I 

8041 8051 8061 8080 8095 

8041 8051 8064 8084 8095 

8041 8056 8089 8100 

8041 8089 
<Bcc I 

>StsX 

>Fok X 

>Bst F5I 

Sty D4I MseX 
ScrFl rsp509I CviJX 

PspGl MseX Alu I MseX 

BstNI >BsmFX Tsp5Q9X Hin dXXX <CstHX 

I II I III III II 

CACGACCTGGATGGAGTGGGACAGAGAAATTAACAATTACACAAGCTTAATACACTCCTT 816 0 

GTGCTGGACCTACCTCACCCTGTCTCTTTAATTGTTAATGTGTTCGAATTATGTGAGGAA 

I II I • II I -III' II- 

8106 8118 . 8128 8143 8155 

8106 8130 8144 8159 

8106 8135 8144 

8106 8147 
8109 
8109 
8109 
8110 
Tfi I 
Hln f I 
>MboXX 

Tsp 5091 MwoX >TspDTX Tsp509l Tsp 5091 

I I I I I I I 

AATTGAAGAATCGC AAAACCAGCAAGAAAAGAATGAACAAGAATTATTGGAATTAGATAA 8 2 2 0 
TTAACTTCTTAGCGTTTTGGTCGTTCTTTTCTTACTTGTTCTTAATAACCTTAATCTATT 

I I I • I . ■ -I -I -I 

8161 8173 8193 8202 8211 . 

8165 

8168 

8168 
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Msel CviJl <Tsp DTI 

Tsp 50 91 Tsp 5 0 91 Tsp 50 91 

I I II II 

ATGGGCAAGTTTGTGGAATTGGTTTAACATAACAAATTGGCTGTGGTATATAAAATTATT 8 2 80 
TACCCGTTCAAACACCTTAACCAAATTGTATTGTTTAACCGACACCATATATTTTAATAA 

I • I • I I- • I I- 

8237 8255 8274 

8244 8259 8279 

Cv± JI 

<Mnl I Rsa I 

<BsaXl Csp6X 
<BsaXX Afsel rati Sfcl 

II I I III 

CATAATGATAGTAGGAGGCTTGGTAGGTTTAAGAATAGTTTTTGCTGTACTTTCTATAGT 8 3 40 
GTATTACTATCATCCTCCGAACCATCCAAATTCTTATCAAAAACGACATGAAAGATATCA 

III- I • • I I • I 

8294 8309 8326 8334 

8294 8327 
8295 8327 
8297 

<Mnl I 
Bsa JI 
>Sth 1321 

<SspD5I <Si2nI Nil 38771 

<Hph I Hpy 18 81 >Mnl I Ava I 

I III III 

GAATAGAGTTAGGCAGGGATATTCACCATTATCGTTTCAGACCCACCTCCCAACCCCGAG 84 0 0 
CTTATCTCAATCCGTCCCTATAAGTGGTAATAGCAAAGTCTGGGTGGAGGGTTGGGGCTC 

• I • II I • II I • 

8363 8377 8386 8395 

8363 8380 8395 

8395 
8396 
8398 

>Sth 1321 
<Slm I 



Vpa KllAI 








Unb I 








Sau 961 








Psp 031 








Nla IV 








Fmu I 








Ava II 


<Hpy AV 




MJbo I 


San DI 


>Sth 1321 




Z)pn I 


Pss I 


Unb I 




Cha I 


Ppu MI 


Sau 961 




Bst KTI 


Nla IV 


Hae III 


<HpyAV 


<AlwJ 


fTco O109I 


Finu I 


>Mbo II 


<BsinAl 


>Bsm FI 


CviJX 


>Mbo II 


<Bsm AT Bst Yl 



1 1 II III III I III 

GGGACCCGAC AGGCCCGAAGGAATAGAAGAAGAAGGTGGAGAGAGAGACAGAGACAGATC 8 4 6 0 
CCCTGGGCTGTCCGGGCTTCCTTATCTTCTTCTTCCACCTCTCTCTCTGTCTCTGTCTAG 



III 1 


• 1 1 1 • 


1 I- 1 


1 -1 II • 


8401 


8412 


8426 


8445 8456 


8401 


8412 


8429 


8451 


8401 


8412 


8432 


8457 


8401 


8412 




8457 


8401 


8412 




8457 


8401 


8414 




8457 


8402 


8417 




8457 


8402 








8402 








8402 








8402 








8402 








8402 








8403 








8405 
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Sty I 

BsaJX 
Mbo I 
Dpn I 
Cha I 
Bst KTI 
NialV 

BstYX MboX 

BamHZ Dpn X <Mbo XX 

>AlwX ChaX NlaXV >EarX 

rag I HpyQX <AlwX Bst KTI Cvi JX >MnlX 

Es a BC3X >rspGWi >BsmFX <AciX MwoX 

I I I 1 1 I I I III 1 1 1 1 

CATTCGATTAGTGAACGGATCCTTGGCACTTATCTGGGACGATCTGCGGAGCCTGTGCCT 8 52 0 
GTAAGCTAATCACTTGCCTAGGAACCGTGAATAGACCCTGCTAGACGCCTCGGACACGGA 

I -I I II -I • I -I III MM 

8464 8475 8496 8506 8517 

8464 8471 8478 8501 8510 8518 

8477 8501 8508 8519 

8477 8501 8520 

8477 8501 
8477 
8478 
8478 
8478 
8478 

8481 
8481 

MwoX <BsmAX 
Cvi J I . >Bpl X 

AluX Sml I 

<Bco 57MI >BpuEX MaeXXX 

<AcuX >AciX >BplX Hpy ISSXXX <Mnl X 

I II III I II 

CTTCAGCTACCACCGCTTGAGAGACTTACTCTTGATTGTAACGAGGATTGTGGAACTTCT 8 5 80 
GAAGTCGATGGTGGCGAACTCTCTGAATGAGAACTAACATTGCTCCTAACACCTTGAAGA 

1 II • I I -I I I • I 

8521 8533 8541 8550 8563 

8521 8536 8558 

8525 8536 

8525 8541 
8526 8541 

>PleX 
>MlyX 

>HgaX >MnlX TfiX Hpy CH4III Hpy 188III 

>BsmFX Cvi J I 5spl Hln f I SfcX HlntX 

I I III I II II 
GGGACGCAGGGGGTGGGAAGCCCTCAAATATTGGTGGAATCTCCTACAGTATTGGAGTCA 8 6 4 0 
CCCTGCGTCCCCCACCCTTCGGGAGTTTATAACCACCTTAGAGGATGTCATAACCTCAGT 

II • Ml I • I • I I • II- 

8581 8599 8607 8617 8624 8635 

8583 8602 8617 8626 8638 

8635 
8635 
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Dde I 

<Bsp CNI 

<Bse Mil 
<Bpu 101 
<Si5vCI 

CacBX Cvi JI 

Cvi JI AIul >BsniFI 

Alul Cvi JI AlvNI <W/3i I 

II I I MM I 

GGAACTAAAGAATAGTGCTGTTAGCTTGCTCAATGCCACAGCCATAGCAGTAGCTGAGGG 87 0 0 
CCTTGATTTCTTATCACGACAATCGAACGAGTTACGGTGTCGGTATCGTCATCGACTCCC 

• II • I I • III I I- 
8663 8680 8688 8696 
8663 8692 8699 

8664 8692 

8693 
8693 

8694 

8694 

8694 

>Cst MI 
Rsa I Mwo I 

Csp6I Cvi JI Cvi JI >MboTX 

rati AluI Alt2l BfaX 

I I I I I I II 

GACAGATAGGGTTATAGAAGTAGTACAAGGAGCTTGTAGAGCTATTCGCCACATACCTAG 87 6 0 
CTGTCTATCCCAATATCTTCATCATGTTCCTCGAACATCTCGATAAGCGGTGTATGGATC 

• II I -11 I . • II 
8722 8731 8740 8757 

8723 8731 8740 8760 

8723 8732 
8727 

Cvi JI 
<Cje I 
<CjePI 

>Bsp 2 41 <Bccl 

III I 
AAGAATAAGACAGGGCTTGGAAAGGATTTTGCTATAAGATGGGTGGCAAGTGGTCAAAAA 8 82 0 
TTCTTATTCTGTCCCGAACCTTTCCTAAAACGATATTCTACCCACCGTTCACCAGTTTTT 

III - • I • 

8769 8798 
8769 
8770 

8774 

Cvi JI 
Dde I 
Mwo I Tse I 

Bipl Fnu 4HI 

Cvi JI Cvi JI Bth CI 

<Bcc I Aiul Cac 81 

>Stsl <Ace III >BJbvI 

>Fokl <BsmBl <Bsp CNI 

>Bst F5I Hpy CH4III <BSinAI <BseMII <BCC I 

II I I I MM II I I 

GTAGTGTGATTGGATGGCTTACTGTAAGGGAAAGAATGAGACGAGCTGAGCCAGCAGCAG 8 8 8 0 
CATCACACTAACCTACCGAATGACATTCCCTTTCTTACTCTGCTCGACTCGGTCGTCGTC 

- II I M - I - MM II I I 

8832 8841 8858 8866 8880 

8832 8858 8866 

8832 8863 8874 

8833 8864 8870 

8836 8864 8874 

8865 8874 
8865 8874 
8866 

8869 
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<B3m AI 
<Bsa I 
Tag I 

Hpy 188III 
Xhol 

Sml I 

SciX StyD4I 
>sraNI Scr FI 

>BSC AI Psp GI 

rsel Hpy 1 8 8III 

Fnu4HI fTsa BC3I Nia III 

BthCI Nli3S7lX Fat 1 

>Bbvl Aval Bst NI Cvi All 

I I I I I I I I 
ATGGGGTGGGAGCAGCATCTCGAGACCTGGAAAAACATGGAGCAATCACAAGTAGCAACA 8 9 4 0 

TACCCCACCCTCGTCGTAGAGCTCTGGACCTTTTTGTACCTCGTTAGTGTTCATCGTTGT 
• I I I I I I I • I • 



8892 


8899 


8906 


8916 


8892 


8899 




8916 


8892 


8900 




8916 


8892 


8898 






88 


95 


8906 




88 


95 


8906 






8899 


8906 






8899 








8899 








8900 








8900 








8902 








8902 







MWO I 

Cvi J I <TthlllII BfaZ <MnlX 

Alul TseT Cvi Jl >BseRl 

Tsel MwoZ Sty D4 1 <Mnll 

Fnu4HI Fnu 4HI ScrFI >BseRI 

Bth CI Bth CI PspGI <WnI I 

>BJbvI <BJbvI Bst NI >Bse RI <MnlZ 

III II III I I I I 
CAGCAGCTACCAATGCTGCTTGTGCCTGGCTAGAAGCACAAGAGGAGGAGGAGGTGGGTT 9 0 00 

GTCGTCGATGGTTACGACGAACACGGACCGATCTTCGTGTTCTCCTCCTCCTCCACCCAA 

III - II- III • I I I -I 

8943 8955 8965 8982 8991 

8943 8955 8965 8982 

8943 8955 8965 8985 

8943 8955 8965 8985 

8945 8955 8968 8988 

8945 8957 8970 8988 
8946 
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Rsa I 
Csp 61 

NlalV SfcX 

Kpn 1 Cvi J I 

X?del Alul Ddel 

>BspCNI Pvull Mbol 

>dseHII MspAll Dpnl 

>Mnl I Tse I Cha I 

Tsp 451 Bail I FilU 4HI Bst KTI 

Waelll Acc 651 Bth CI Bst YI 

<BsrZ Bsu 3 61 Msel >Bbvl Bcfl 11 Cvi JX 

II II II I III I II I I 

TTCCAGTCACACCTCAGGTACCTTTAAGACCAATGACTTACAAGGCAGCTGTAGATCTTA 9 0 6 0 
AAGGTCAGTGTGGAGTCCATGGAAATTCTGGTTACTGAATGTTCCGTCGACATCTAGAAT 

I I • II II * I • - MM' II I I 

9003 9012 9024 9045 9053 9060 

9006 9017 9045 9053 

9006 9017 9045 9054 

9012 9045 9054 

9013 9046 9054 

9013 9046 9054 

9013 9047 9057 

9017 9047 
9017 9049 
9018 
9018 

>Fal I 
>Hin 41 
>Hae IV 

<HpyAV >Mbo 11 

Msel >Bsrl rsp5 09I >Bb3Z 

Bra I >Bsm F I Cvi Jl >Bbr 1 1 Eco RV 

II I I I I I I I I I 

GCCACTTTTTAAAAGAAAAGGGGGGACTGGAAGGGCTAATTCACTCCC AAAGAAGACAAG 912 0 
CGGTGAAAAATTTTCTTTTCCCCCCTGACCTTCCCGATTAAGTGAGGGTTTCTTCTGTTC 

II- • I I I I I • * I I II 

9068 9083 9094 9112 9120 

9069 9086 9098 9112 

9090 9112 

9115 
9115 

9118 

Mbo 1 

Dpn I 
Mbo I Cha I 
Dpnl Bst Yl 
Chal >Alwl 

Bst KTI Bst KTI Cvi Jl 

III \ 
ATATCCTTGATCTGTGGATCTACCACACACAAGGCTACTTCCCTGATTAGCAGAACTACA 918 0 
TATAGGAACTAGACACCTAGATGGTGTGTGTTCCGATGAAGGGACTAATCGTCTTGATGT 

|. II . . I . 

9129 9137 9153 

9129 9136 
9129 9136 
9129 9137 

9137 

9137 
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StyD4l 
ScrFX 
PspGX 
Bst NX 
Hae III 
Cvi JI 
Unbl 
5ai2 96l 
Finu I 
Sty D4I 

Scr FI i?sa I 

PspGI >Sts I Csp 61 

Bst NI FcoRV >FoA: I Bfa I 

Bsll Bsa Jl HpylBSl Bsll <BccZ Cvi JI 

Bsa JI >Siml >TspRl >Bst F5I Alu 1 <Bsrl 

I II I I I I I III I I I I 

CACCAGGGCCAGGGGTCAGATATCCACTGACCTTTGGATGGTGCTACAAGCTAGTACCAG 9 2 4 0 
GTGGTCCCGGTCCCCAGTCTATAGGTGACTGGAAACCTACCACGATGTTCGATCATGGTC 

I II h I I !• I -I II • l-l I I • 

9183 9193 9205 9216 9229 9237 

9183 9189 9196 9211 9217 9229 

9183 9199 9216 9231 

9183 9216 9234 

9183 9234 
9183 

9186 
9186 
9186 
9187 
9187 
9189 
9189 
9189 
9189 

Hae III 
Cvi JI 
Hae I 

<Mnl I Mae III 

>MboXl Cvi J J 

Cvi Jl <EarX >CstMX Alu X Dra XXX 

I I I I I I III 

TTGAGCCAGATAAGATAGAAGAGGCCAATAAAGGAGAGAACACCAGCTTGTTAC ACCCTG 9 3 0 0 
AACTCGGTCTATTCTATCTTCTCCGGTTATTTCCTCTCTTGTGGTCGAACAATGTGGGAC 

I • I -III •! -Ill 

9244 9258 9271 9285 9294 

9258 9285 

9261 9290 
9262 
9263 
9263 
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sty 0^1 
<BccX >Sthl32I 
NlaXll ScrFl 
Fat I >Sts I 

Cvl All >FokX HpalX >AciI 
Hpy CH4V >Bst F5I Tau X 

CacQX >StsX NciX FnuAHX 

CviJX >FokX <SimX BthCX 

<Cje'PX >Bst F5X SCO HI <MnlX CviJX 

III I I II I I II I III 

TGAGCCTGCATGGGATGGATGACCCGGAGAGAGAAGTGTTAGAGTGGAGGTTTGACAGCC 93 6 0 
ACTCGGACGTACCCTACCTACTGGGCCTCTCTCTTCACAATCTCACCTCCAAACTGTCGG 

III I I- II I -I II • • I • llh 

9302 9313 9323 9347 9357 

9303 9313 9321 9358 

9304 9313 9323 9358 

9307 9317 9358 

9309 9317 9324 9359 

9309 9317 
9309 9323 
9314 9323 
9323 

<Sts I 
<Fok X 

OstFSX 
Hpy CH4V 
UnbX TseX HpylQBXXX 

Sau96X Fnu4UX 
HaeXXX BthCX RsaX 
Fmu X <BbvX CspSX 

TaiX >Sthl32X >Sfa}>lX Tat X 
HpyCH4IV CviJX BspBX 
PmlX Nli3B77X >BscAX ScaX 
BsaKX AvaX AluX BsaViX 
BfaX <TspDTX CviJX <AceXXX HpaXX HpylBBXXX >Cdi X 

I I 11 II I II II ri I I I I I 

GCCTAGCATTTCATCACGTGGCCCGAGAGCTGCATCCGGAGTACTTCAAGAACTGCTGAC 94 20 
CGGATCGTAAAGTAGTGCACCGGGCTCTCGACGTAGGCCTCATGAAGTTCTTGACGACTG 

I I II II III • I II M II I * I 

9363 9370 9380 9387 9396 9406 9420 

9375 9382 9388 9395 
9375 9382 9392 9400 

9376 9388 9395 

9376 9382 9392 9400 

9380 9389 9401 

9380 9389 9401 

9380 9389 
9380 9389 9395 

9391 
9393 
9393 
9393 
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Cac 81 
Cvi JI 
Alu I 

Tag I 
Esa BC3I 
I I 



>BsniFl 



>BsmFZ 
<Bse YI 
Msp All 
>Aci I 



<Mnl I 
5ty D4I 
5crFI 
PspGl 
Bst NX 
Bsa JI 
I 



Sty D4I 
Scr FI 
Psp GI 
BstNI 
Bsa JI 
Hae III 
Cvi JI 
Hae I 
II I 



>BsinFI 
<Sth 1321 
<Fau I 
<Aci I 



I I 



ATCGAGCTTGCTACAAGGGACTTTCCGCTGGGGACTTTCCAGGGAGGCGTGGCCTGGGCG 
TAGCTCGAACGATGTTCCCTGAAAGGCGACCCCTGAAAGGTCCCTCCGCACCGGACCCGC 



9480 



9422 

9422 

9425 
9425 
9426 



9437 



9445 
9445 
9447 



>Bsr I 
>Bmr I 



9451 



Mbo I 
Dpn I 
Cha I 
Bst KTI 
<AlvI 
Hpy 1881 
Dde I 
>Bsp CNI 
>i»fni I 
CvlJI Bst YI 
Bsp 12861 
Ban II AlvNI 
Cac 81 >Bse MI I ffpy CH4V 
III II I I I I I 



9459 
9459 
9459 
9459 
9459 



9464 



I I 
9470 

9471 

9471 
9473 
9473 
9473 
9473 
9473 



I I 
9478 
9478 

9479 
9480 



Tse 1 
Fnu 4HI 
BthCI 
<BhvZ 
Cvx JI 
Alu I 
Pvu II 
Msp All 
Tse I 
Fnu 4 HI 
Bth CI 
>Bbv I 
II II 



>Bsr I 
>Bmr I 
Rsa I 
Csp 61 
Tat I 
Bsl I 
I II I 



GGACTGGGGAGTGGCGAGCCCTCAGATCCTGCATATAAGCAGCTGCTTTTTGCCTGTACT 
CCTGACCCCTCACCGCTCGGGAGTCTAGGACGTATATTCGTCGACGAAAAACGGACATGA 



9540 



9483 
9483 



I II llllll 
9494 9501 
9496 9503 
9496 
9497 9504 
9500 
9501 
9501 
9502 

9505 
9505 
9505 
9505 
9505 



9510 



II I I 
9519 
9519 
9519 
9519 
9520 
9520 
9521 
9521 
9522 
9522 
9522 
9522 



I II I 
9533 
9535 
9536 
9536 
9538 
9538 
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Cvi JI 
Alu I 
Ddel Sad 
<Bsp CNI SCO ICRI 
<BseMII Bspl286I 
Hpy 18 81 Bsi HKAI 
MboX Sty D41 
Dpn I ScrFl 
Cha X Psp GI 

>Bs/nAI Bst KTI Bst NI >rspRI 

>Bsa I Bst YI Bsa JI Bfal <Bts X 

>SimX BglXX Cvi JX Ban XX CvlJX NlaXW 

III II II II II I II I 

GGGTCTCTCTGGTTAGACCAGATCTGAGCCTGGGAGCTCTCTGGCTAACTAGGGAACCCA 9 6 0 0 
CCCAGAGAGACCAATCTGGTCTAGACTCGGACCCTCGAGAGACCGATTGATCCCTTGGGT 

III • II II II- II • I I- I I- 

9541 9560 9567 9574 9583 9593 

9542 9560 9569 9589 9599 

9543 9561 9569 9599 



II II 1 


I- 


9560 9567 


9560 


9569 


9561 


9569 


9561 


9569 


9561 


9569 


9561 


9569 


9563 




9564 




9564 




9564 





9574 
9574 
9574 
9574 
9575 
9575 

Mwo I 
Cac 81 
Cvi JI Cvi JI 

Afsel Alu I SmlX >Sth 1321 

Sjnl I >MnlX HindXXX Bsp 12861 

AflXX >FalX >BpuEX B/ne 15801 

I I I I I I I I I II 
CTGCTTAAGCCTCAATAAAGCTTGCCTTGAGTGCTTCAAGTAGTGTGTGCCCGTCTGTTG 96 6 0 

GACGAATTCGGAGTTATTTCGAACGGAACTCACGAAGTTCATCACACACGGGCAGACAAC 

II II III I I ' - II 
9604 9618 9626 9647 
9604 9610 9618 9647 

9605 9619 9626 9650 

9608 9619 
9620 

9624 
Dde I 
>Bsp CNI 
>BseMII 

Mbo X 
Dpn X 

<Ple I Cha X 

<MlyX Bst KTI 

Bin fx <AlwX <SimX 

Tsp 451 Bjfa I >MnlX 

MaeXXX MaeXXX BstYX Hpy 1Q8X >TspRX BfaX 

M I I M I I I I I I 

TGTGACTCTGGTAACTAGAGATCCCTCAGACCCTTTTAGTCAGTGTGGAAAATCTCTAGC 9 7 2 0 
ACACTGAGACCATTGATCTCTAGGGAGTCTGGGAAAATCAGTCACACCTTTTAGAGATCG 

II •! I M III h -I • I • 

9662 9671 9679 9686 9701 9716 

9662 9675 9684 

9664 9680 9689 

9664 9680 
9664 9680 
9680 
9680 

9685 
9685 
9685 



HXB2 -> Restriction Map 5/02/07 16:22:16 Page 



A 9721 
T 



/// 



Journal of Virology. May 1991» p. 2415-2421 
0022-538X/91/052415-07$02,00/0 

Copyright ® 1991, American Society for Microbiology 



Vol. 65. No. 5 



A Single-Stranded Gap in Human Immunodeficiency Virus 
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The structure of unintegrated human immunodeficiency virus type 1 (HIV-1) DNA from acutely infected 
human lymphoid cells was analyzed by nuclease SI cleavage. We observed a unique, discrete single-stranded 
gap in unintegrated linear DNA molecules, located near the center of the genome. Oligonucleotide primer 
extension experiments determined that the downstream limit of this gap coincides with the last nucleotide of a 
central copy of the polypurine tract found in all sequenced lentivirus genomes. Other retroviruses have only one 
copy of the polypurine tract at the 5' boundary of the 3' long terminal repeat, which has been shown to 
determine initiation of retroviral DNA plus-strand synthesis. We conclude from our observations that the 
central repeat of the polypurine tract can create an additional site for plus-strand synthesis initiation in 
lentlvlruses. The central single-stranded gap was not found In circular DNA molecules, the vast msyority of 
them carrying only one long terminal repeat. This finding suggests that the generation of such circular 
molecules Is associated with early DNA ligation events. 



Retroviruses replicate through reverse transcription of 
their RNA genomes into a double-stranded DNA molecule. 
Retroviral genes are expressed from an integrated copy of 
this double-stranded DNA genome, the provirus. Both 
strands of the retroviral DNA genome are synthesized by the 
virus-encoded reverse transcriptase, which has both RNA- 
and DNA-dependenl DNA polymerase activities (10, 38, 41). 
The template for minus-strand synthesis is viral genomic 
RNA, and the template for plus-strand synthesis is the newly 
reverse-transcribed minus strand, following removal of RNA 
from the RNA-DNA hybrid by the RNase H activity asso- 
ciated with reverse transcriptase (8. 11). Synthesis of the 
minus strand and subsequently of the plus strand is initiated 
near the 5' end of the respective template. The correspond- 
ing short segments (minus- and plus-strand strong-stop 
DNAs) are further transferred to the other end of the 
template, resulting in the formation of the long terminal 
repeats (LTRs), present at each end of the provirus (10, 30, 
37). The primer for minus-strand synthesis is the 3' end of a 
tRNA molecule packaged in the viral particle together with 
genomic viral RNA and hybridized to the primer binding site 
located at the 3' boundary of the U5 region (35, 36, 39). The 
initiation site of the plus strand has been deduced from 
analysis of reverse transcription reactions products obtained 
either in vitro or from detergent-disrupted virions and from 
the sequences of proviral molecular clones. This site is 
located immediately 3' of a polypurine tract (PPT) represent- 
ing the 5' boundary of the U3 region (21. 22) (Fig. 1). It has 
been proposed that the PPT is used to define an RNA primer 
by specific cleavage of the RNA template at this site by the 
reverse transcriptase-assoctated RNase H (20, 23, 27, 32). 
Indeed, in murine retroviruses, in vitro reverse transcription 
reactions reveal RNA primers that remain associated with 
the elongating plus strand and are heterogeneous in length 
(7, 26). Unlike most retroviruses, human immunodeficiency 



viruses (HIV) and other lentiviruses have two copies of the 
PPT, one at the border of the 3' LTR and the other located 
near the middle of the genome, within the pol coding region 
(5, 12, 33, 43). Previous experiments on visna virus, the 
prototypic lentivirus, have shown that unintegrated DNA 
molecules display a single-stranded gap located approxi- 
mately in the same area (3, 14). Other viral DNA genomes 
carry single-stranded gaps or nicks. This is the case for 
hepatitis B virus and cauliflower mosaic virus, in which 
synthesis of viral DNA genome involves a reverse transcrip- 
tion step. 

We have tested here the hypothesis that the HIV central 
PPT represents an additional initiation site for the synthesis 
of the plus strand of HIV DNA. We show that HIV linear 
unintegrated DNA molecules carry a discrete single- 
stranded gap whose downstream limit is the last nucleotide 
of the central PPT, indicating that this structure is likely used 
to initiate plus-strand synthesis at the center of the genome. 
We also show that the single-stranded gap is absent from 
circular molecules. Since it is known that such circular 
molecules are formed only after transport of reverse tran- 
scription products into the nucleus, we propose that nuclear 
ligation and DNA repair events result in both the closing of 
circular DNA molecules and filling of the gap. 



MATERIALS AND METHODS 

Cells and viruses. MT4 cells were a gift from M. David 
Hogan, Laboratory of Molecular Microbiology, National 
Institute of Allergy and Infectious Diseases, Bethesda, Md, 
These cells, which are transformed by human T-cell leuke- 
mia virus type I, were shown to allow acute cytopathic 
HIV-1 infection (13). CEM clone 13 cells (28) were derived 
from the human lymphoid cell line CEM (ATCC CCL119) 
and express high levels of CD4 antigen. Cells were main- 
tained in RPMI 1640 medium (GIBCO Laboratories) supple- 
mented with 10% fetal calf serum. 
The viral isolate used in our experiments was HlV-l^ru (2). 



* Corresponding author. 

2415 



2416 GHARNEAU AND CLAVEL 



J. Virol. 



gag 



vif 



tat 



' nAm ■ 










tgnaa vpr U 




1 HA/ 



CO WP S 




ATOCaWCatfkTTT T MkAltf S ft Ai U W BGGGGGMfTGGG 



FIG. 1. Positions of the two PPTs on the HIV-1 genome. Shaded areas in the pol open reading frame indicate the regions coding for 
diflferent functions, pbs. Primer binding site, representing the initiation site for minus-strand synthesis. 



recovered following transfection of COS cells with infectious 
provira! molecular clone pBRU-2 (24a). Cells \yere infected 
at a multiplicity of 1:10 (1 50% tissue culture infective dose 
per 10 cells) with virus from a frozen (-80**C) stock pro- 
duced on MT4 cells and titrating 8 x 10^ tissue culture 
infective doses per ml on MT4 cells. Following infection, 
cultures were monitored for cytopathic effect, which in the 
described conditions of infection appeared approximately 3 
days after infection. At that time, cells were harvested for 
DNA isolation. 

Analysis of viral DNA. Low-molecular-weight DNA was 
extracted from infected cells by Hirt extraction (15). Nucle- 
ase SI (Appligene, Strasbourg, France) was used at 1.5 U/jj-g 
of DNA after addition of 1:10 volume of lOx SI buffer (300 
mM sodium acetate [pH 4.6], 500 mM NaCl. 10 mM ZnCy 
and incubated at ST^C. For double digestions with a restric- 
tion enzyme and nuclease SI, DNA (10 M.g) was first digested 
with the restriction enzyme; then lOx nuclease SI buffer and 
nuclease SI (15 U) were added to the reaction mixture, and 
the sample was further incubated at 37*C. DNA was then 
subjected to electrophoresis on 1% agarose gels that did not 
contain ethidium bromide and analyzed by Southern blotting 
(34). 

The nucleotide position numbers used here to describe 
DNA fragments start at the first nucleotide of a linear HIV 
DNA genome (5' end of the U3 region in the 5' LTR). 

Two probes were used in hybridization experiments. The 
5' probe was a Pst\ fragment spanning the gag region from 
positions 1415 to 2839, and the 3' probe was a Kpnl fragment 
spanning the env region from positions 6343 to 9005. Both 
probes were obtained from HIV-1 molecular clone pNL4-3 
(1) and labeled by the random hexamer method (6). 

Primer extension. A modification of the primer extension 
technique was used, based on Taq DNA polymerase, which 
allows multiple cycles of denatu ration, annealing, and poly- 
merization. The reaction mix contained 10 fig of Hirt DNA, 
0.2 mM each deoxynucleotide, 1.25 U of Taq DNA polymer- 
ase (Perkin-Elmer Cetus N801-0045). 10 pmol of 5'-end- 
labeled primer, and 0.5 \x\ of PMPE reagent (Stratagene) in 
10 mM Tris-HCl (pH 8.3)-50 mM KCl-2 mM MgClj-O.Ol^ 
gelatin, for a total reaction volume of 50 \jA, The first cycle 
included 30 s for denaturation at 92'*C, 1 min for annealing at 
50**C. and 1 min for polymerization at 72X. The following 59 
cycles included 10 s at 92*'C, 1 min at 50X, and 1 min at 
72^. As only one primer was included in the reaction, this 
multicycle primer extension method is distinct from a poly- 
merase chain reaction and resulted in linear, not exponen- 
tial, amplification of the reaction product. 



Oligonucleotide primers were 5' end labeled with T4 
polynucleotide kinase in the presence of [7-^^P]ATP to a 
specific activity of 5 x 10^ to 10* cpm/10 pmol. Primer pol-1 
(5'-ACA ATC ATC ACC TGC CAT CTG) anneals to the 
plus strand at position 5085. Primer pol-2 (5'-TCC AAA 
GTG GAT CrrC TGC TGT) anneals to the plus strand at 
position 4951. 

The same oligonucleotide primers were used to generate a 
sequence ladder from a single-stranded M13 template carry- 
ing the HIV-lbru plus strand from positions 4688 to 5129. 
Sequencing was done by the method of Sanger et al. (29). 
Primer extension and sequence reactions were analyzed in 
parallel on a 6% polyacrylamide-8 M urea sequencing gel. 

RESULTS 

A single-stranded gap In HIV-1 unintegrated DNA. We 
hypothesized that the existence of an additional plus-strand 
initiation site in the HIV-1 genome could be revealed by the 
presence of a single-stranded region in HIV-1 unintegrated 
DNA molecules. To examine this possibility, low-molecular- 
weight DNA was selectively extracted (15) from acutely 
HlV-l-infected MT4 cells (13), treated with nuclease SI, and 
analyzed by Southern blotting with a probe spanning most of 
the 3' half of the HIV-1 genome. Figure 2 shows the kinetics 
of nuclease SI digestion of HIV-1 unintegrated DNA. In 
undigested DNA, three bands were observed. The position 
of the middle band, around 9.5 kb, is consistent with linear 
full-length molecules, and the position of the lower band, 
around 6 kb, is consistent with closed (supercdiied) circles. 
The upper band, which has an apparent size of approxi- 
mately 15 kb, is likely to be open (relaxed) circular mole- 
cules. After 5 min of digestion with nuclease SI (15 U/10 jig 
of DNA), major changes occurred. The band corresponding 
to linear molecules turned into a doublet; the upper band had 
the size of intact linear molecules (9.5 kb). and the lower 
one, at approximately 9 kb, was consistent with the size of 
linearized circles with one LTR. Simultaneously, the closed- 
circles 6-kb band completely disappeared, while the inten- 
sity of the open-circles band increased slightly. More strik- 
ingly, a new 5-kb band appeared, corresponding to the 3' half 
of linear molecules cut into two frajgments by nuclease SI. 
The other fragment, of similar size, could be detected in a 
distinct hybridization experiment with a 5' probe (data not 
shown). This finding indicates the presence of a unique 
nuclease Sl-sensitive site in HIV unintegrated DNA mole- 
cules located approximately at the center of the genome. 

The complete disappearance of the closed circles, even 
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FIG. 2. Kinetics of nuclease SI action on HIV-1 unintegr^ — 
DNA. Shown is a Southern blot of low-molecular-weight DNA from 
HIV-l-infected MT4 cells, hybridized to a probe representing most 
of the 3' half of the HIV-1 genome (see Fig. 3B). Lanes: NT, no 
nuclease SI treatment; 5, 15, 30, and 60, treatment with SI nuclease 
(1.5 \J/\Lg of DNA) for 5, 15, 30. and 60 min, respectively. O, Open 
circles, L, linear molecules; C. closed circles. 



after 5 min of nuclease SI treatment, can be best explained 
by a previously described high sensitivity of supercoiled 
circular DNA molecules to nuclease SI (19). Therefore, we 
assumed that at this stage of incomplete nuclease SI diges- 
tion, the closed circles were cleaved at random sites, into 
open circles if only one strand was cut (this explains the 
increase in open circles) and into linear molecules when 51 
nuclease further cleaved at the resulting nicks. 

With further digestion, the open-circles band gradually 
disappeared, together with the full-length linear molecules, 
while the band corresponding to linearized one-LTR circles 
remained unaffected. The gradual disappearance of open 
circles was due to their cleavage at random nicks that relate 
to their relaxed physical state: SI cleavage at those sites 
linearized them. Meanwhile, linear molecules were further 
cleaved at the central Sl-sensitive site, generating more of 
the probe-reactive S-kb fragment: after 1 h of treatment, very 
little of the full-length linear molecules remained. 

The central single-stranded gap is discrete and unique. We 
further attempted to localize and define the structure of the 
HiV-1 central nuclease Sl-sensitive site. Figure 3 A shows a 
Southern blot analysis of nuclease SI cleavage products of 
HIV-1 unintegrated DNA, obtained from acutely HIV-l- 
infected MT4 cells, combined with digestion by two different 
restriction enzymes. In these experiments, complete restric- 
tion enzyme digestions were performed, followed by partial 
nuclease SI digestion (20 min with 15 U/10 jxg of DNA). The 
digestion products were examined by hybridization to two 
different HIV-1 probes: a 5' probe, in the gag region, 
spanning from nucleotide positions 1415 (from the start of 
U3) to 2839, and a 3' probe, spanning most of the env and nef 
regions, from nucleotide positions 6343 to 9005. 

In the HlV-lj,nj genome, Fstl has a unique site in the gag 
coding region at position 1415. Digestion of HIV-1 uninte- 
grated DNA with Pstl (Fig. 3A, lanes 3) cut the linear 
molecules in two fragments of 8.3 and 1.4 kb, the latter being 
undetected by either of the two probes used in this experi- 
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FIG. 3. Position of the single-stranded gap on the HIV-1 ge- 
nome. (A) Two autoradiograms from the same Southern blot, 
hybridized to two different probes as indicated at the top. Low- 
molecular-weight DNA from HIV-l-infected MT4 cells was ana- 
lyzed by nuclease SI treatment either with no previous restriction 
enzyme digestion (lanes 1 and 2)» after digestion with Pstl (lanes 3 
and 4), or after digestion with BamHl (lanes 5 and 6), Lanes 1,3, and 
5, No nuclease SI treatment; lanes 2, 4, and 6. treatment with 
nuclease SI (1.5 UVg of DNA). (B) Positions of the corresponding 
probes on the HIV-1 genome, relative to the central PPT and to the 
unique Fstl and BamHl restriction sites. 



ment. Pstl also linearizes circles into a 9-kb product. Nucle- 
ase SI treatment of Fstl digestion products (lanes 4) yielded 
a 5-kb fragment reactive with the 3' probe and a 3.3-kb 
fragment reactive with the 5' probe. This result establishes 
that both boundaries of the central single-stranded gap are 
discrete. 

After digestion with BamHl (Fig. 3 A, lanes 5), which has 
a unique site in the env region at position 8520, the linear 
DNA molecules were cleaved into two fragments of approx- 
imately 8.5 and 1.2 kb, both reactive with the 3' probe. 
Treatment with nuclease SI (lanes 6) released a 3.7-kb 
fragment detected by the 3' probe and a 4.9-kb fragment 
reactive with the 5' probe. The 1.2-kb fragment, which 
includes the 3' PPT, remained unaffected by nuclease SI 
treatment. When no restriction enzyme digestion was per- 
formed before nuclease SI treatment, SI cleavage (lanes 2) 
released a 5' 4.9-kb fragment and a 3' fragment of similar 
size. Overall, the results from these single- and double- 
digestion experiments were consistent and could locate the 
single- stranded gap around position 4900, within 0,2 kb of 
the central PPT repeat found in the HlV-lbru sequence. 

These experiments show that the central single-stranded 
gap is unique in the HIV-1 genome. Indeed, the sizes of the 
fragments released by both the single and double digestions 
excluded the presence of another gap. 

The 3' boundary of the gap is the central copy of the PPT. 
To locate the 3' boundary of the single- stranded gap more 
precisely relative to the central PPT found in all lentiviral 
genomes, primer extension experiments were conducted, 
using low-molecular-weight DNA from acutely HIV-l-in- 
fected lymphoid cells as a template. We used oligonucleotide 
primers complementary to the viral plus strand, located 
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FIG. 4. Localization of the downstream limit of the single- 
stranded gap by oligonucleotide primer extension. Two primer 
extension reactions with two different oligonucleotide primers are 
presented. Primer pol-1 anneals to the HIV-1 plus strand at positions 
5085 to 5105, and primer pol-2 anneals at positions 4951 to 4971. 
Both oligonucleotides were ^^P 5' end labeled and allowed to prime 
an extension reaction, using low«molecular-weight DNA from HIV- 
1-infected cells (lanes HIV) or noninfected cells (lanes NI) as 
templates as described in Materials and Methods. Lanes A, G, C, 
and T are sequence reactions from a single-stranded M13 template 
carrying the HIV-li^ plus strand from positions 4688 to'5129» using 
the indicated oligonucleotide as a primer, ppt, Central copy of the 
PPT. 



downstream to the central copy of the PPT, expecting 
elongation of these primers to stop precisely at any interrup- 
tion of the plus strand. To increase the sensitivity, the 
enzyme used was Taql polymerase, which allowed us to heat 
denature extension reaction mixes and to carry several 
cycles of extension for each reaction. Two different an- 
tisense 5'-end-labeled oligonucleotide primers were used. 
Primer pol-2 anneals to the plus-strand 135 nucleotides 
downstream of the PPT, and primer pol-1 anneals 134 
nucleotides further downstream. In parallel with extension 
reactions, DNA sequence reactions were conducted, using 
the same oligonucleotide primers on single-stranded M13- 
HIV-lbni templates, enabling us to precisely locate the stop 
in primer extension on the HlV-l^ru genome (Fig. 4). With 
both primers we observed a single, discrete stop in primer 
extension that coincided with the last nucleotide of the 
central PPT. No equivalent stop was found with low-molec- 
ular-weight DNA from uninfected cells. In addition, no 
HIV-specific signal could be observed with a primer com- 
plementary to the minus strand, located upstream of the 
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central PPT (data not shown). This result demonstrates that 
the 3' boundary of the HIV-1 central single-stranded gap is 
defined by the central copy of the PPT. We can infer from 
this observation that the central PPT is used as an additional 
initiation site for the synthesis of the plus strand of HIV-1 
DNA. 

The single-stranded gap is found exclusively on linear DNA 
molecules. Electrophoretic analysis of undigested and of 
Pstl- or BamHI-digested unintegrated HIV-1 DNA showed 
an approximately equal proportion of linear and circular 
molecules in acutely HIV-infected MT4 cells. Indeed, after 
Pstl or BamVil digestion, the intensity of the shortened 
product resulting from digestion of linear molecules was 
approximately equal to that of the 9-kb linearized circles, 
found in both BamHl and Pstl digestion reactions. Of 
interest is that no band with a size similar to that of the 
full-length linear molecules (corresponding to two-LTR lin- 
earized circles) could be observed in these reactions. This 
means that the vast majority of circular molecules in the 
acutely HIV-l-infected MT4 system have only one LTR. 

Several lines of evidence indicate that circular molecules 
do not carry a single-stranded gap. First, the closed circles 
by definition cannot be gapped, although they are highly 
sensitive to nuclease SI digestion (19). Second, the SI 
digestion kinetics experiment (Fig. 2) shows that the 9-kb 
product corresponding to the linearized one-LTR circles was 
still present, even after 1 h of treatment with nuclease SI, 
when native linear molecules had almost completely disap- 
peared. Third, after digestion with Pstl and BamHl, which 
cut only once in the HlV-l^ru genome, nuclease SI treatment 
did not generate products of a size compatible with frag- 
ments released from gapped circular molecules. For exam- 
ple, the Ba/nHI-nuclease SI double digestion failed to re- 
lease a 5.5-kb fragment, which would span from the BarriWl 
site to the gap in a circular one-LTR molecule and would be 
reactive with the 5' probe. Similarly, the /'sri-nuclease SI 
double digestion did not release any 3 '-probe-reactive prod- 
uct of 5.7 kb, which would be the distance between the gap 
site and the Pstl site in a circular one-LTR molecule. 
Overall, the products observed in the double digests were of 
a size compatible only with DNA fragments released from 
native, linear two-LTR molecules. 

Finally, we have observed that the proportion of circular 
versus linear molecules could vary with the source of the 
analyzed DNA. Figure 5 shows a comparison of nuclease SI 
treatment of low-molecular-weight DNA from MT4 cells 
harvested 2 days after HIV-1 infection (lanes 1 to 4) and from 
CEM cells harvested 9 days after infection (lanes 5 to 8). In 
the CEM cells, the proportion of linear molecules was low, 
as shown by the lower proportion of 9.5-kb band than of 
open and closed circles on undigested lane 5. Another sign of 
the low proportion of linear molecules is visible on lane 7, 
where the intensity of the 1.2-kb fragment released by 
BamHl is weaker than that of the 9-kb linearized circles. 
Coincidentally, the amount of subgenomic product released 
by nuclease SI digestion of the same DNA samples was low. 
In the MT4 cells, in which more linear molecules were 
found, the quantity of subgenomic SI digestion product was 
proportionally much higher than in the CEM cells. 

DISCUSSION 

We have observed that HIV-1 unintegrated linear DNA 
molecules, which accumulate in acutely infected CD4+ 
lymphoid cells, carry a short, unique, single-stranded region 
at the center of the genome. Both ends of this single- 
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FIG. 5. Nuclease SI sensitivity of HIV-l unintegrated DNA 
from two different cell cultures. Lanes: 1 to 4, low-molecular-weight 
DNA from MT4 cells harvested 2 days after infection; 5 to 8, 
low-molecular-weight DNA from CEM cells harvested 9 days after 
infection. Unintegrated HIV-1 DNA was analysed by nuclease Si 
treatment either with no previous restriction enzyme digestion 
(lanes 1, 2, 5, and 6) or after digestion with Bam HI (lanes 3, 4, 7, and 
8). Lanes 1, 3, 5, and 7, No nuclease SI treatment; lanes 2, 4, 6, and 
8. treatment with nuclease SI (1.5 U/(Jig of DNA). Both Southern 
blots were hybridized to the 3' probe shown in Fig. 3B. Both 
autoradiograms are overnight exposures. 



stranded gap are discrete. We have established that the 3' 
boundary of this single-stranded element coincides with a 
PPT found at position 4800 in the HIV genome, an exact 
repeat of the structure normally found in all retroviruses 
next to the U3 region of the LTR (at position 9070 in the 
HIV-1 provirus). We have not determined the 5' boundary of 
the gap and therefore cannot be certain of its size. However, 
analysis of the sizes of fragments released from nuclease 
Sl-treated linear molecules following restriction enzyme 
digestions and corresponding to either the 3' or the 5' side of 
the gap suggests that this gap is short, probably less than 100 
nucleotides in length. 

In other retrovirzd models, the PPT has been shown to 
determine in vitro the initiation site of retroviral DNA 
plus-strand synthesis (20, 23, 27, 32). The DNA genomes of 
hepatitis B and cauliflower mosaic viruses, which are syn- 
thesized through reverse transcription of an RNA template, 
carry several single-stranded structures. In particular, there 
are two plus-strand discontinuities in the cauliflower mosaic 
virus genome that are defined by short PPTs which likely 
correspond to plus-strand initiation sites (25, 42). Our find- 
ings indicate that in HIV-1, and most likely also in other 
lentiviruses, the central PPT is used in vivo as an additional 
priming site for plus-strand synthesis. Indeed, primer exten- 
sion experiments shown here reveal that the plus-strand 
DNA 3' to the gap starts exactly at the last nucleotide of this 
PPT. This finding establishes that this structure determines 
precise and specific priming of plus-strand DNA at a central 
position in the genome. Because we used Tag polymerase 
and not reverse transcriptase in the primer extension exper- 
iments, we could not conclude whether the RNA primer 
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FIG. 6. Model for HIV reverse transcription. 1, Minus strong- 
stop synthesis; 2, first (minus-strand) template transfer and plus- 
strand strong-stop synthesis, initiated at the 3' PPT; 3, progression 
of minus-strand synthesis and initiation of plus-strand synthesis at 
the central PPT; 4, second (plus-strand) template transfer; 5, forma- 
tion of the LTRs by strand displacement and synthesis and progres- 
sion of synthesis of the 5' half of the plus strand; 6, linear gapped 
DNA molecule; 7, following step 4, ligation at both boundaries of 
the LTR, before strand displacement and synthesis can start, and 
progression of plus-strand synthesis; 8, ligation at the gap and 
formation of a one-LTR closed circle. 



corresponding to the central PPT remained attached to the 
nascent DNA strand, as described for other retroviruses at 
the 3' PPT (7, 26). 

It has been shown in avian retroviruses that plus-strand 
synthesis can start at sites distinct from the 3' PPT, resulting 
in plus-strand discontinuities (16, 18). However, these initi- 
ation sites do not seem to be unique or well defined, and they 
do not result in a discrete single-stranded structure compa- 
rable to what we describe here. It is likely that the resulting 
plus-strand segments are eliminated by strand displacement 
events, as was shown in mellitin-permeabilized virions (4). 
In HIV, such a strand displacement is likely to occur at the 
3' PPT, to generate the 5' end of linear molecules (Fig. 6. 
steps 4 and 5), but seems not to occur at the central PPT. We 
cannot explain why the upstream limit of the gap remains 
discrete. The gapped linear molecules represent a defined 
species: they are full-length, double-stranded molecules on 
which synthesis of the 5' half of the viral plus strand is 
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stopped near the initiation site of the 3' half. It is possible, 
however, that similar to what is found in cauliflower mosaic 
virus (25), elongation of the upstream HIV plus strand 
engages in a brief strand displacement event. Further studies 
are needed to clarify this point. 

As a whole, our findings support a model of retroviral 
DNA synthesis (Fig. 6) in which linear molecules require a 
strand displacement step for LTR synthesis (4, 26), whereas 
one-LTR circles are the results of ligation events. Indeed, 
closed circular DNA molecules, which are found exclusively 
in the nucleus (17, 31), could be generated by ligation at both 
boundaries of the LTR and at the gap (Fig. 6, steps 7 and 8), 
following the proposed 'Mntrastrand** plus-strand template 
transfer (24). This, in turn, could result from early nuclear 
transport of uncompleted DNA molecules. 

Finally, it remains to be understood why lentiviruses, 
unlike other retroviruses, have developed and conserved a 
repeat of the PPT at the center of the genome. The most 
likely explanation is that it allows progression of plus-strand 
synthesis before elongation of the minus strand is complete 
(Fig. 6, step 3), probably resulting in a gain of time in DNA 
synthesis. The precise location of the PPT repeat at the 
center of the genome supports this hypothesis: an additional 
plus-strand initiation at this site could allow its elongation to 
reach the 3' PPT approximately at the same time minus- 
strand synthesis is completed, the latest being required for 
plus-strand template transfer (Fig. 6, step 4). The LTRs can 
then be synthesized as plus-strand synthesis is being com- 
pleted in the 5' half of the genome (Fig. 6, step 5). Since 
lentiviruses, which are not transforming viruses, rely essen- 
tially on reverse transcription for their propagation, this 
feature could constitute an evolutionary advantage. The 
initiation of HIV plus-strand synthesis at the center of the 
genome is also interesting in view of recent observations that 
in unstimulated normal human lymphocytes, synthesis of the 
HIV minus strand appears to be arrested approximately 
halfway along the genomic RNA template (44). Dependence 
of full-length DNA synthesis on cell growth had also been 
described for avian viruses (9, 40). If the described stop in 
HIV minus-strand synthesis is located beyond the central 
PPT, it may allow early synthesis of a plus strand covering 
the whole 3' half of the genome and may direct faster 
completion of double-stranded full-length DNA molecules 
upon further lymphocyte activation. 
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